@marketsensi and I thought about this and came to the conclusion that the standard hypothesis test used to analyze A/B tests doesn’t fit well with the marketing problem that we are usually trying to solve. 2/17
-
-
Prikaži ovu nit
-
Hypothesis tests are used by academics who want to find small effects with high confidence, but in marketing we care about the big effects. Big effects are where the profit comes from! 3/17
Prikaži ovu nit -
In a tactical A/B test -- which we call a “Test & Roll” -- the marketer plans to randomly assign two treatments to a small test group and then use the data to choose one of the treatments to deploy to the rest of the population. 4/17
Prikaži ovu nit -
Marketers do this type of experiment all the time. For example, here is an email A/B testing tool from
@CampaignMonitor.@HubSpot,@ConstantContact and@MailChimp all have similar A/B testing tools. 5/17pic.twitter.com/s86lICH169
Prikaži ovu nit -
In a test & roll, there is a fundamental trade-off between the opportunity cost of the test and the potential to deploy the wrong treatment. With a few distributional assumptions, we can compute the expected profit of a test & roll for different test sizes. 6/17pic.twitter.com/AVKnfgIuwh
Prikaži ovu nit -
So, how big is the profit-maximizing test & roll? @marketsensi and I came up with a new sample size formula that maximizes expected profit. Here it is: 7/17pic.twitter.com/E6v4hWfs3t
Prikaži ovu nit -
If you don’t like formulas, you can compute the profit-maximizing test & roll sample size at http://testandroll.com . We also provide R code at https://github.com/eleafeit/testandroll …. 8/17
Prikaži ovu nit -
So, what do you do when you get the results of the test? Simple: pick the treatment with the higher average response. No inconclusive tests! We test then roll in an intuitive way. 9/17
Prikaži ovu nit -
@Chris_Said developed a similar approach with encouragement from@johnvmcdonnell. He frames the problem around discounting, but the idea of trading opportunity cost against deployment errors is essentially the same. https://chris-said.io/2020/01/10/optimizing-sample-sizes-in-ab-testing-part-I/ … 10/17Prikaži ovu nit -
Why should stop doing hypothesis tests? Profit-maximizing tests are much smaller and generate more profit! In one of our case studies a hypothesis test requires a sample size of 18,468 per group while the profit-maximizing size is 2,284 and makes more profit. 11/17pic.twitter.com/lzq2UIibMN
Prikaži ovu nit -
A test & roll usually deploys the wrong treatment at a greater rate than an NHST. Is this bad? No, because the sample size has been optimized for profit. Errors are most likely to happen when the difference between treatments is small and doesn't affect profit much. 12/17
Prikaži ovu nit -
Our sample size formula also scales with the size of your population. So, if you have lots of customers/traffic, then you should run a bigger test, but marketers with small populations should still run tests. Tests always improve profit! 13/17pic.twitter.com/sTFqua6Lhc
Prikaži ovu nit -
Computing a test & roll sample size requires information about the range of average response you expect from your treatments. Using data on the past performance of similar treatments, you can estimate priors with an HB model using tools like
@mcmc_stan and@pymc3. 14/17Prikaži ovu nit -
Those familiar with multi-armed bandits may be wondering how this compares to a dynamic approach. We looked into that and found that an optimal-size test & roll (orange in plots) achieves nearly the same level of expected profit as Thompson Sampling (purple). 15/17pic.twitter.com/ooiLwyMTSH
Prikaži ovu nit -
The paper is called “Test & Roll: Profit Maximizing A/B tests” and just came out @MarketingScience: https://pubsonline.informs.org/doi/10.1287/mksc.2019.1194 ….
@marketsensei wrote more about it at https://www.ron-berman.com/blog/ . 16/17#MarketingAcad#EconTwitter#Measure#epitwitterPrikaži ovu nit -
What else will you find in the paper? We show how to compute profit-maximizing sample sizes when one treatment is likely to perform better than the other. That is, we can tell you the optimal size of holdout in media incrementally tests. Read the paper for the details. 17/17pic.twitter.com/enaD1j4ebp
Prikaži ovu nit
Kraj razgovora
Novi razgovor -
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.