It's been coming up a lot lately, so I thought I'd do a bit of a thread on CONVENIENCE SAMPLES and why they aren't great for assessing POPULATION PREVALENCE of a disease In other words - how many people have had COVID-19? 1/n
-
-
3/n The traditional method is to do a large, randomly-sampled study involving dialing up 10,000s of people across a population and surveying them + doing lots of blood tests But this is EXPENSIVE
Show this thread -
4/n Running a proper statistically representative process, getting all the people to answer their phones and give you bloods...even if the cost per person is low, multiply that by 10-100,000 and the cost can be prohibitive
Show this thread -
5/n Which brings us to the idea of a CONVENIENCE SAMPLE Why is it called a convenience sample (hint: answer is in the name)
Show this thread -
6/n Yes, convenience samples are just that - convenient Usually, they are groups of people that you are ALREADY TESTING for some reason that you can either add another test on to or survey
Show this thread -
7/n I have used this method in the past to look at the burden of diabetes in-hospital and GP clinics - we looked at people who were already getting blood tests, and added one extra test for diabetes (and science!) https://www.sciencedirect.com/science/article/abs/pii/S0168822718318862 …pic.twitter.com/HoYla4gXSD
Show this thread -
8/n But there's an issue here We have selected these people very specifically. They are not a random, representative sample - they were people ALREADY GETTING blood tests which means they are probably different in LOTS OF WAYS to the general population
Show this thread -
9/n So in our lovely study of a convenience sample of diabetes tests, we can't say anything about how much diabetes there is in the community (population prevalence)! All we can talk about is diabetes IN THE PATIENTS TESTED
Show this thread -
10/ "But God", you ask, with a common autocorrect mistake, "what does this have to do with COVID-19?" Well, reader, this is where we get to antibody testingpic.twitter.com/OdmE1jOyKX
Show this thread -
11/n You see, when you get sick with a new disease, your body produces antibodies* We can then test for these antibodies to see if you've had the disease before* *oversimplified, plz don't murder me immunologists
Show this thread -
12/n If you run an antibody test on a large group of people, it's called a serosurvey (because antibody tests are also known as serology in sciency terms)
Show this thread -
13/n Now, a lot of places (countries, states, colleges) have run serosurveys and had a grand old time of it. This is why you keep seeing those news articles saying that x% of people in a place have had COVID-19 alreadypic.twitter.com/PJbt4tWiYb
Show this thread -
14/n The problem is, some of these serosurveys used CONVENIENCE SAMPLES Just like we discussed earlier, that makes them a bit problematic
Show this thread -
15/n My co-authors and I, in our systematic review of age-stratified IFRs for COVID-19, looked into just how problematic The answer: a whole lothttps://www.medrxiv.org/content/10.1101/2020.07.23.20160895v4 …
Show this thread -
16/n For example, one study in Tokyo that used a CONVENIENCE SAMPLE found that 3.8% of people had had COVID-19 in the sample tested But a proper randomized sample found just 0.1% - 38 times lower!pic.twitter.com/3qTdzCXVXX
Show this thread -
17/n In England, a CONVENIENCE SAMPLE of blood donors implied that 1 in 12 people had had COVID-19, but a large representative sample found it was just 1 in 20pic.twitter.com/XyTVSL6LDs
Show this thread -
18/n The problem is, these CONVENIENCE SAMPLES are systematically biased. They are of people who are different to the general population in ways that can be very difficult to measure and/or understand
Show this thread -
19/n Blood donors, for example, are young and healthy by design. But the people who have been (generously) giving blood during the pandemic might also be...well, a bit odd
Show this thread -
20/n They're going to great personal lengths to sacrifice for the rest of us ungrateful buggers, which might indicate that they're more likely to socialize, more likely to mingle, and thus more likely to get infected We JUST DON'T KNOW
Show this thread -
21/n And this is the problem with convenience samples, generally We cannot use them to estimate population prevalence (how many people have had COVID-19), because they aren't representative of society as a whole
Show this thread -
22/n So if you see a headline that says "x% of people infected with COVID-19!" take a leaf out of my mentor's book and ask: "WHAT'S THE DENOMINATOR?" It's a vitally important question
Show this thread -
23/n THIS DOESN'T MEAN THAT CONVENIENCE SAMPLES ARE USELESS I use them in my research. They are brilliant for quick, cheap tracking of rates of infection IN SELECT GROUPS They also provide a brilliant window into change OVER TIME
Show this thread -
24/n For example, if you sample blood donors every week for a year, you've got an amazing insight into the changing nature of the pandemic THIS IS MASSIVELY IMPORTANT AND VERY CHEAP
Show this thread -
25/n You just can't use those results to tell how many people in the rest of society have gotten COVID-19 But that doesn't mean the results aren't helpful at all
Show this thread
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.