Is there a known minimal set of lawful typeclasses that describe map/reduce? Here is my sketch at it:https://gist.github.com/johnynek/6e45ec989fb54e9cec51 …
-
-
@avibryant@posco@dxbydt_jasq It doesn't sort. The shuffle recently(https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html …) started being sort based, but reduceByKey... -
@avibryant@posco@dxbydt_jasq (seeming closest analog to reducers) combines on the map side. -
@avibryant@posco@dxbydt_jasq So, ordering is irrelevant. groupByKey doesn't guarantee any ordering. -
@avibryant@posco@dxbydt_jasq Also, all the values associated with a key after groupBy must fit in memory. -
@j_houg@posco@dxbydt_jasq so join is fundamental, then - can't be expressed as other primitives? -
@avibryant@posco@dxbydt_jasq You end up cogrouping two pairRDDs and specifying a partitioner. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L481 … -
@avibryant@posco@dxbydt_jasq partitioner is for output. If input rdds are already partitioned the same way, the join goes faster. -
@avibryant@posco@dxbydt_jasq So, I guess the primitive used there is cogroup w added info about partitioning so data is task local. - 7 more replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.