Sorry, Ethan. You're misinterpreting the paper, which suffers from limited external validity.
This is 3/4 of businesses that used @optimizely back in 2014, a very skewed population. Optimizely got it wrong initially, then fixed it. See
https://www.amazon.com/gp/customer-reviews/R44BH2HO30T18/ref=cm_cr_dp_d_rvw_ttl?ie=UTF8&ASIN=B00E1JO50M …
-
-
-
Good criticism, but I don’t think its misinterpretation. You argue that Optimizely was bad about pushing people to stop tests early & experimenters have gotten better in the last few years. Those should lower prevalence (but not sure to what degree) & bad practice still a problem
-
Agree that early stopping is still a problem, but not at this scale. The dataset used is from naïve experimenters who used a platform that encouraged this in 2014 (since fixed). That's like claiming that 75% of cars need tires, when the data comes from http://tirerack.com .
End of conversation
New conversation -
-
-
yea it's called peeking and it's a well known problem in A/B experimentation and one that data science teams are keenly aware of. the data for this paper dates back to 2014 and comes from a platform that makes A/B test accessible to non-technical folks - i.e. not data scientists
-
Yes, fair criticism as to the extent of the problem, might be much less (given timeframe and platform) but might be more (many more people testing, much of it even less sophisticated). More research needed!
-
absolutely. more research and more attention needed, particularly in business schools where those questions are largely ignored rn. also anyone doing research on the topic should *talk to practitioners*. otherwise it feels disconnected to reality.
-
(full disclosure: I went to Booth 2012-2014 and took every stats class and never once heard this topic being discussed)
End of conversation
New conversation -
-
-
This is what happens when you slash R&D but still make wild promises to shareholders and the public regarding innovation.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Claim of widespread p-hacking were sensationalized in the paper, which is suffering lack of external validity on two axes: skew due to massive selection bias and time bias. Seehttps://www.linkedin.com/pulse/p-hacking-ab-testing-sensationalized-ronny-kohavi/ …
-
are you saying they p-hacked the p-hack paper?
-
That's an orthogonal claim that others made: a p-value of 0.03 is the lowest in the table of multiple tests:https://twitter.com/deaneckles/status/1074845375096336385 …
-
aw man i was just making a joke
End of conversation
New conversation -
-
-
This surprises no one who has been doing analytics in a commercial setting
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
I’m starting to remodel my entrepreneurship classes to help students build the humility to accept that their initial assumptions were wrong.
-
That’s the hard part. Right now working on them setting up and running market experiments with pre-defined success thresholds
End of conversation
New conversation -
-
-
Sadly, this does not surprise me in the slightest.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Not surprising. Even among SaaS companies, testing cultures are rare. Part of it is ignorance (not understanding why a test must be statistically significant). But a larger portion is that running a test that doesn't go as planned is seen as a bad thing.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Is there a compendium somewhere of bogus marketing and growth hacking tactics that look statistically relevant but in fact are worthless?
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.