I want to write a blog post called "Hadoop is slow" but I still don't really understand if/why Hadoop is slow. halp.https://gist.github.com/jvns/9381521489a99e888c0e …
@squarecog @peterseibel @ansate @b0rk not to mention combiners. 'Course, @scalding can't use those.
-
-
@avibryant@squarecog@peterseibel@ansate@b0rk wait scalding can't use combiners? Why? -
@venusatuluri@avibryant@peterseibel@ansate@b0rk cascading has its own in-memory version to avoid IO overhead. -
@squarecog@venusatuluri@avibryant actually, scalding has it's own as well, and the typed API exposes it individually: sumByLocalKeys -
@scalding@squarecog@avibryant Why have custom combiners at all? Does the default do wasteful IO - I thought combiners were in-process. -
@venusatuluri@scalding@avibryant it does wasteful io. There are pros and cons to its approach but I generally find more cons than pros.
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.