I want to write a blog post called "Hadoop is slow" but I still don't really understand if/why Hadoop is slow. halp.https://gist.github.com/jvns/9381521489a99e888c0e …
@peterseibel @ansate @b0rk @PDXBigData I think this was just a slip, but to clear up any confusion: the reducers sort, not the mappers.
-
-
@avibryant@peterseibel@ansate@b0rk@PDXBigData reducers merge. mappers do the sort. -
@squarecog@peterseibel@ansate@b0rk oh, thanks. So the reducers just do a single pass merge of the sorted output from each mapper? -
@avibryant@peterseibel@ansate@b0rk wait till you learn about mapper spills and merges and serialization... -
@squarecog@peterseibel@ansate@b0rk not to mention combiners. 'Course,@scalding can't use those. -
@avibryant@squarecog@peterseibel@ansate@b0rk wait scalding can't use combiners? Why? -
@venusatuluri@avibryant@peterseibel@ansate@b0rk cascading has its own in-memory version to avoid IO overhead. -
@squarecog@venusatuluri@avibryant actually, scalding has it's own as well, and the typed API exposes it individually: sumByLocalKeys -
@scalding@squarecog@avibryant Why have custom combiners at all? Does the default do wasteful IO - I thought combiners were in-process. - 1 more reply
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.