I am super stoked on the idea of sampled data ala BlinkDB, but what about joins?
-
-
-
Replying to @avibryant
@avibryant yeah, it looks like that's the only answer. Ugh, but we have no foreign keys, and many, many kinds of primary key relationships.1 reply 0 retweets 0 likes -
Replying to @abestanway
@avibryant unless maybe you wrote your job first, then did some kind of bespoke sampling? Doubtful it'd be faster than running the full job.1 reply 0 retweets 0 likes
Replying to @abestanway
@abestanway so @rhall_cool did a Scalding tool that does exactly this: bespoke sampling for your job. Helps on subsequent runs, similar jobs
8:54 AM - 26 Aug 2013
0 replies
0 retweets
1 like
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.