Feel very foolish: I've been doing data analysis and viz for Quantum Country "in the cloud" using BigQuery for the past few years, because it's what I was used to from KA. Big data! But we only have a few M samples. I can fit it all in RAM! I'm *so* much faster iterating locally.
Conversation
This is my first time using an R notebook for something serious, and it is certainly quite a powerful tool for thought. Visualizations which would have taken me a day to finagle in BQ/GDS now take me a few minutes. This makes me ask/answer different questions…
7
27
Even with all the intermediate tables and computations, the whole notebook's environment only consumes 1/8 of my system's RAM. Such a classic mistake to have made.
3
13
One subtle point that's making me faster in R: BQ tries hard not to let you do inefficient things in your queries. So often you have to contort yourself unnaturally to express what you mean. But duh—with a few M samples I can just burn the cycles and program naturally. It's fine.
Replying to
It's funny that this big epiphany is really about my data set being puny compared to "real" big data… and yet, multi-million sample experimental analyses are pretty rare in edtech research.
3
8
Extra bonus lol: R is not even very efficient! Basically all the math I'm doing is vectorized, but it's still just running against one CPU core (I have 10). And guess what: doesn't matter.
1
11
