Monday, January 18, 2016

MutablePair



  MutablePair (core/src/main/scala/org/apache/spark/util/MutablePair.scala)

  - minimizes object allocation for tuple2.

case class MutablePair[@specialized(Int, Long, Double, Char, Boolean) T1,
                                     @specialized(Int, Long, Double, Char, Boolean) T2]
(var _1: T1, var _2: T2) extends Product[T1, T2]

 - specialized annotation is used to tell compiler to generate special classes for these types, where Boxing is avoided. see this for more info.

def this() = this(null.asInstanceOf[T1], null.asInstanceOf[T2])

- serialization requires a no-arg constructor. (for first non-serializable parent class)

usage example from spark sql:

rdd.mapPartitions { iter =>
          val getPartitionKey = getPartitionKeyExtractor()
          val mutablePair = new MutablePair[Int, InternalRow]()
          iter.map { row => mutablePair.update(part.getPartition(getPartitionKey(row)), row) }
}



No comments:

Post a Comment