Feel very foolish: I've been doing data analysis and viz for Quantum Country "in the cloud" using BigQuery for the past few years, because it's what I was used to from KA. Big data! But we only have a few M samples. I can fit it all in RAM! I'm *so* much faster iterating locally.
Conversation
This is my first time using an R notebook for something serious, and it is certainly quite a powerful tool for thought. Visualizations which would have taken me a day to finagle in BQ/GDS now take me a few minutes. This makes me ask/answer different questions…
7
27
Even with all the intermediate tables and computations, the whole notebook's environment only consumes 1/8 of my system's RAM. Such a classic mistake to have made.
3
13
One subtle point that's making me faster in R: BQ tries hard not to let you do inefficient things in your queries. So often you have to contort yourself unnaturally to express what you mean. But duh—with a few M samples I can just burn the cycles and program naturally. It's fine.
1
4
It's funny that this big epiphany is really about my data set being puny compared to "real" big data… and yet, multi-million sample experimental analyses are pretty rare in edtech research.
Replying to
If there's anything I've learned the past 4 years working for a "Big Data Lab": pretty much nobody has Big Data. Obligatory: frankmcsherry.org/graph/scalabil
1
Replying to
And even then, a sample size of a few M would seem like it would fit 99%+ of applications
Instead of being miserable with BQ, why don’t big co data scientists download a few M random rows and analyze these in RAM?
1
2
Probably they do, and I'm just dumb!
1
Show replies


