I am not going to comment on the reasons for the mistakes, because I obviously have no idea why the authors chose to make them, but I think it's worth pointing out the very clear errors Maybe they'll be corrected
-
-
Show this thread
-
Before we look at the problems, it's worth noting that I'm not kidding - this paper has already had a big impact Altmetric 452, the top publication from the journal it's inpic.twitter.com/GhVMqj1BHC
Show this thread -
So, let's look at the errors. The paper is here:https://onlinelibrary.wiley.com/doi/abs/10.1111/ijcp.13528 …
Show this thread -
Basically, the authors did two things: 1) calculated the average growth rate in cases of COVID-19, and for some reason called this average R(ADIR) 2) used linear regression to predict R(ADIR) using a few oddly-chosen variables
Show this thread -
They found that the only thing that significantly* predicted R(ADIR) was cases/1,000 of COVID-19, and then extrapolated from the regression equation to assume that if cases/1,000 = 6.6 the R(ADIR) would be 0 *STATISTICALLYpic.twitter.com/qwxzjNENcq
Show this thread -
This is the FIRST big mistake You can't just assume linearity, and you certainly can't just extrapolate like that. For example, marathon times have been going down for years, but if we extrapolate linearly to the year 2100 they'll take 0 minutes to run!
Show this thread -
You can run statistical tests to check all of these things and see if your assumptions are correct, but the authors didn't This is first-year stats stuff, and it's just missed entirely
Show this thread -
Which brings us to the SECOND big mistake: collinearity Without delving into too much depth, it's bad statistical practice to regress two things that we know are very closely relatedpic.twitter.com/GibiB0Hffl
Show this thread -
In this case, the authors calculated a variable (R(ADIR)) from new case numbers, and then regressed it against case numbers THESE TWO VARIABLES WILL ALWAYS BE CORRELATED BECAUSE THEY ARE CALCULATED FROM THE SAME INFORMATION
Show this thread -
It's like regressing your age in years against the average age you've been over the last decade, and then shouting "Eureka!" when you find they are closely tied Again, a simple mistake, and something a first-year stats student is taught not to do
Show this thread -
The THIRD big mistake is epidemiological. The authors assumed - with no evidence whatsoever - that an R(ADIR) of 0 meant immunity This is WRONGpic.twitter.com/2eCmbzdVDH
Show this thread -
Let's look at R(ADIR) It is basically the average of the ratio of new cases today to new cases over the last 5 days So where does immunity come in??? The authors don't say
pic.twitter.com/6uhb6rGvYp
Show this thread -
Thing is, we can quite easily see many reasons for R(ADIR) to be 0 - this simply means that there are no new cases today (essentially) The most likely reason for no new cases? SOCIAL DISTANCING
Show this thread -
Assuming - without any evidence whatsoever - that R(ADIR) = 0 means immunity is simply wrong It is possible (with a vaccine) that this could be the case, but it is by no means plausible
Show this thread -
On to the FOURTH big mistake: multiplying weirdly Basically they used that linear extrapolation to find that R(ADIR) = 0 when total cases/1,000 = 6.6, and then assumed that since this meant immunity, the other 993.4 people must've been exposed to COVID-19pic.twitter.com/MoH05TDwkZ
Show this thread -
This is nonsensical. Even taking their entire approach at face value, the correlation between these two variables was only r^2 = 0.22 It is, again, simply wrong to just multiply the values out like this, because quite clearly there is more going on
Show this thread -
There are numerous other errors in the study, but I think I've made my point If I were the author or the journal, I'd retract the study immediately But that's just me
Show this thread
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.
