Feel very foolish: I've been doing data analysis and viz for Quantum Country "in the cloud" using BigQuery for the past few years, because it's what I was used to from KA. Big data! But we only have a few M samples. I can fit it all in RAM! I'm *so* much faster iterating locally.
Conversation
This is my first time using an R notebook for something serious, and it is certainly quite a powerful tool for thought. Visualizations which would have taken me a day to finagle in BQ/GDS now take me a few minutes. This makes me ask/answer different questions…
7
27
Even with all the intermediate tables and computations, the whole notebook's environment only consumes 1/8 of my system's RAM. Such a classic mistake to have made.
Replying to
One subtle point that's making me faster in R: BQ tries hard not to let you do inefficient things in your queries. So often you have to contort yourself unnaturally to express what you mean. But duh—with a few M samples I can just burn the cycles and program naturally. It's fine.
1
4
It's funny that this big epiphany is really about my data set being puny compared to "real" big data… and yet, multi-million sample experimental analyses are pretty rare in edtech research.
3
8
Extra bonus lol: R is not even very efficient! Basically all the math I'm doing is vectorized, but it's still just running against one CPU core (I have 10). And guess what: doesn't matter.
1
11
Replying to
Wasn't there that one paper we need your help!) that shows that a single chonky machine blows out of the water spark et al? (In many cases). Back at the startup we definitely swore by this,squeezing a single postgres instance endlessly before giving up to fancier solutions
2
1
Yeah, that point about Postgres is exactly what I was thinking when I realized that I could do this. So funny that I made the mistake I chastise others for making. Poor transfer learning!
1
Replying to
My compiled-in assumption is that computers run at 1 MIPS and have 1 MB of RAM, because that’s what I did my PhD on. Optimizations necessary to make code run on that “normal” sort of computer constantly come to mind and I have keep saying NO, COMPUTERS ARE INFINITELY FAST
2


