Conversation

Feel very foolish: I've been doing data analysis and viz for Quantum Country "in the cloud" using BigQuery for the past few years, because it's what I was used to from KA. Big data! But we only have a few M samples. I can fit it all in RAM! I'm *so* much faster iterating locally.
4
3
56
One subtle point that's making me faster in R: BQ tries hard not to let you do inefficient things in your queries. So often you have to contort yourself unnaturally to express what you mean. But duh—with a few M samples I can just burn the cycles and program naturally. It's fine.
1
4
It's funny that this big epiphany is really about my data set being puny compared to "real" big data… and yet, multi-million sample experimental analyses are pretty rare in edtech research.
3
8
Extra bonus lol: R is not even very efficient! Basically all the math I'm doing is vectorized, but it's still just running against one CPU core (I have 10). And guess what: doesn't matter.
1
11
Possibly, but IME so far, improvement has mostly come from running more interesting / thoughtful experiments, rather than analyzing the data I have.
1
1
Show replies
Replying to
My analysis workflows usually go the other way around - python in a local notebook first, sql in a data warehouse later. Even if the data is too big for local machine, it pays to take random samples at first and play with them with a pandas or dplyr type api!
1
2
Replying to
R is so underrated. I used to work as a Data Engineer, all in R. Turns out R is really nice to work with, especially armed with the tidyverse. Didn't miss Python but whenever I needed a Python package reticulate worked well
1