The author Leo Breiman (who you may know from such greatest hits as bagging and random forests) talks about going from stats academia to working in industry as a statistical consultant. When he eventually returned to academia, he experienced a sort of reverse culture shock
-
Show this thread
-
He frames academia and industry as different cultures of statistical modeling. Both share the goal understanding the relationship of some input variables X an output variable Y. The true relationship is unknown, a black box, but statisticians in both camps aim to approximate itpic.twitter.com/Cfi7gBp0hV
1 reply 0 retweets 2 likesShow this thread -
The paper describes data modeling (academic stats) culture as assuming that the black box is fundamentally orderly, stochastic and parametric. The other culture (algorithmic modeling) isn't terribly interested in what's in the black box, focusing instead on predictive accuracypic.twitter.com/mZWCki8BJE
1 reply 0 retweets 2 likesShow this thread -
After spending time in industry, Breiman finds himself more sympathetic to algorithmic modeling culture, and he gets into plenty of formal and technical reasons why. I'm not gonna get into them in a tweet thread, but I recommend checking out the paper if you're interested in them
1 reply 0 retweets 3 likesShow this thread -
Some of his reflections from working in industry resonate strongly with me: * Focus on finding a good solution—that’s what consultants get paid for * Live with the data before you plunge into modeling
1 reply 0 retweets 8 likesShow this thread -
Some kind of caught me off guard to hear a statistician in industry say: * Search for a model that gives a good solution, either algorithmic or data * Predictive accuracy on test sets is the criterion for how good the model is
1 reply 0 retweets 3 likesShow this thread -
The first of those two is a little less surprising for me, given that "a good solution" could mean a lot of different things, but the second is both makes a lot of sense to me and is weird to think about.
1 reply 0 retweets 3 likesShow this thread -
That's probably because of the type of DS I am. I've done some forecasting and productionized precious few ML models, but I've mostly modeled to do what Breiman describes as "extracting information about how nature is associating response variables to input variables"
1 reply 0 retweets 6 likesShow this thread -
I perceiving caring about accuracy first as ML eng territory, and I perceive MLE as stemming directly from computer science. This is sort of silly of me, given that "data scientist" has been an overloaded junk title for most of the time I've held it (and if I'm honest still is)
1 reply 0 retweets 6 likesShow this thread -
Specialization within the DS world is still emerging and while some of the boundaries between types of DS roles have gotten sharper in the last few years, they're all still coming from the same lineage
1 reply 0 retweets 3 likesShow this thread
"Terrabytes of data are pouring into computers from many sources, both scientific, and commercial, and there is a need to analyze and understand the data," Breiman says, like a prophet foretelling HBR articles to come
-
-
And reflecting on his work in the 90's (!!!), he was already seeing how crossfunctional this line of work can be: "there has been a noticeable move toward statistical work on real world problems and reaching out by statisticians toward collaborative work with other disciplines"
1 reply 0 retweets 4 likesShow this thread -
The problems we're solving with data today are greater in scale and complexity, which means we're getting the luxury of focusing on narrower subsets of those problems. Now we ask for analysts, scientists, MLEs, analytics engineers, etc. instead of just DSes or statisticians
1 reply 0 retweets 10 likesShow this thread
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.

