I am super stoked on the idea of sampled data ala BlinkDB, but what about joins?
-
-
@avibryant yeah, it looks like that's the only answer. Ugh, but we have no foreign keys, and many, many kinds of primary key relationships. -
@avibryant unless maybe you wrote your job first, then did some kind of bespoke sampling? Doubtful it'd be faster than running the full job. -
@abestanway so @rhall_cool did a Scalding tool that does exactly this: bespoke sampling for your job. Helps on subsequent runs, similar jobs
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.