Conversation

I have been corresponding with the authors of the well-known Santa Clara County COVID-19 preprint, and I am alarmed at their sloppy behavior. The confidence interval calculation in their preprint made demonstrable math errors - *not* just questionable methodological choices.
126
4,819
Everyone makes mistakes, but the record must be corrected ASAP. I emailed them on Saturday morning asking them to do so. In the last three days they haven't corrected anything yet, but a subset of them have released a new study without saying how they did the analysis this time.
4
613
Given the critically important and time-sensitive policy decisions being made now, if the authors are still pressing their case in the media using possibly incorrect calculations, then I feel I should make my criticism public too.
6
615
The errors are not debatable and can be seen in these two screenshots of the supplement: 0.0034, the standard error meant to measure uncertainty about prevalence pi, is not the square root of 0.039, and the variance of a binomial estimate of proportion depends on the sample size.
Image
Image
12
580
I can't redo the whole calculation myself because parts were not described anywhere, but I have low confidence that those parts were done correctly; if not, the corrected confidence interval for prevalence in Santa Clara County might well stretch all the way to include zero.
6
478
The authors said by email that they used a built-in Stata function and aren't sure themselves how the software used the input weights. I suspect they misapplied that function (too complicated to tweet why) but I don't know Stata well enough to be sure; it seems neither do they.
15
483
Trevor Hastie, Steve Goodman, and I have been asking for more information about their analysis and their data. The authors have been gracious and reasonably responsive; they say they are redoing the stats in a second draft and will share data if possible.
5
364
I was satisfied to wait for the second draft, which is supposedly imminent and which the authors assured us will include a detailed description and code for a different confidence interval method based on bootstrapping.
2
273
However, I'm flabbergasted that yesterday afternoon, a group including several of the same authors circulated a press release describing new results for a similar study in LA County, without any accompanying technical report and before correcting the Santa Clara County preprint.
6
642
Whichever is the case, before journalists publicize any more results from this group, they should know that the confidence intervals reported in both studies have no known statistical provenance as of now. The calculations are not questionable; they are either wrong or unknown.
5
630
shows the SCC data are too noisy even to rule out the possibility that all the positives are false positives. Simply put, the difference between 50 heads in 3330 flips (SCC residents) and 2 heads in 371 flips (negative controls*) isn't statistically significant.
6
383
The authors have demographic information they have not yet shared, so it's conceivable a more refined analysis will pin down the prevalence more precisely. My point is that right now, as far as I know, no such analysis exists.
1
215
Thanks also to for his perceptive tweets about the paper that piqued my interest in the first place.
Quote Tweet
Ok, so what's wrong with the confidence intervals in this preprint? Well they publish a confidence interval on the specificity of the test that runs between 98.3% and 99.9%, but only 1.5% of all the tests came back positive! 1/
Show this thread
4
252