In a new study, I identify and recover a deleted set of #SARSCoV2 sequences that provide additional information about viruses from the early Wuhan outbreak: https://www.biorxiv.org/content/10.1101/2021.06.18.449051v1 … (1/n)
-
-
Using this approach, I recovered files for the 34 early samples that were virus positive. I was able to use the data in the files to reconstruct partial viral sequences (from start of spike to end of ORF10) for 13 of these samples. (6/n)
Afficher cette discussion -
Now I need to give background to explain a confusing scientific mystery about other early
#SARSCoV2 sequences. Although events that led to emergence of#SARSCoV2 in Wuhan are unclear (zoonosis vs lab accident), everyone agrees deep ancestors are coronaviruses from bats. (7/n)Afficher cette discussion -
Instead, early Huanan Seafood Market
#SARSCoV2 viruses are more different from bat coronaviruses than#SARSCoV2 viruses collected later in China and even other countries.@lpipes@ras_nielsen give nice technical analysis at https://academic.oup.com/mbe/article/38/4/1537/6028993 … (9/n)Afficher cette discussion -
The conundrum is easily seen by plotting the relative differences from the bat coronavirus RaTG13 outgroup versus collection date for early
#SARSCoV2. See how the first reported viruses from Wuhan (leftmost blue points) aren’t the closest to RaTG13. (10/n)pic.twitter.com/YuVp4efUNq
Afficher cette discussion -
Same result if we use other bat coronaviruses like RpYN06 or RmYN02. To see this, go to https://jbloom.github.io/SARS-CoV-2_PRJNA612766/deltadist.html … for an interactive plot that allows you to select the bat coronavirus outgroup and mouse over points for strain details. (11/n)
Afficher cette discussion -
How do deleted sequences I recovered relate to this conundrum? If we include those sequences, and note 4 sequences from Guangdong are from two groups of people infected in Wuhan in late Dec / early Jan, we get plausible scenarios that resolve above problems. (12/n)
Afficher cette discussion -
These two scenarios are plotted below. Each has a different “progenitor”, which is the sequence that gave rise to all *currently* known
#SARSCoV2 sequences (still may not be virus that infected patient zero if other early sequences remain unknown). (13/n)pic.twitter.com/3k7eHrgNgf
Afficher cette discussion -
Both putative progenitors have 3 mutations relative to Seafood Market viruses that make them more similar to bat coronavirus. One is progenitor inferred by
@kumar_lab@sergeilkp et al (https://academic.oup.com/mbe/advance-article/doi/10.1093/molbev/msab118/6257226 …), other has C8782T, T28144C, and C29095T relative to Wuhan-Hu-1. (14/n)Afficher cette discussion -
Both progenitors suggest
#SARSCoV2 was circulating in Wuhan before December outbreak at Huanan Seafood Market, which is corroborated by lots of other evidence, including news articles from China in early 2020 (see intro to my paper linked in first Tweet in this thread). (15/n)Afficher cette discussion -
There are also broader implications. First, fact this dataset was deleted should make us skeptical that all other relevant early Wuhan sequences have been shared. We already know many labs in China ordered to destroy early samples: https://www.scmp.com/news/china/society/article/3084635/china-confirms-unauthorised-labs-were-told-destroy-early … (16/n)pic.twitter.com/ajtm8SxfVu
Afficher cette discussion -
Sequence sharing could be further limited by fact that scientists in China are under an order from the State Council requiring central approval of all publications: https://apnews.com/article/united-nations-coronavirus-pandemic-china-only-on-ap-bats-24fbadc58cee3a40bca2ddf7a14d2955 … (17/n)pic.twitter.com/rGwfFUONTn
Afficher cette discussion -
Second major implication is that it may be possible to obtain additional information about early spread of
#SARSCoV2 in Wuhan even if efforts for more on-the-ground investigations are stymied. (18/n)Afficher cette discussion -
Scientific communication and data sharing typically rely on trust. The NIH Sequence Read Archive has >13,000,000 runs, so they have to trust authors when they request deletions as not feasible to validate reasons for all requests, some of which are legitimate. (19/n)
Afficher cette discussion -
In case of data set I describe above, it seems possible that trust that the NIH Sequence Read Archive grants to scientific authors to delete data may have been used to obscure sequences informative for understanding early
#SARSCoV2. (20/n)Afficher cette discussion -
Fortunately, Sequence Read Archive has rigorous data tracking enabling them to determine when data deleted & stated justification by authors. In fact,
@NIHDirector@NCBI have already determined this & generously shared info w me, but will let them share more widely. (21/n)Afficher cette discussion -
It is important to examine if other trust-based systems in science conceivably may have also been used to hide data relevant to origins / early spread of
#SARSCoV2. This includes not only looking more at sequence databases, but also paper reviews, grant reporting, etc. (22/n)Afficher cette discussion -
Third major implication is that scientists need to stay focused on data-driven study of
#SARSCoV2 origins / early spread. After spending the last 4 months studying this closely, I am cautiously optimistic that additional relevant data are still likely to come to light. (23/n)Afficher cette discussion -
We should therefore avoid dogmatic arguments about
#SARSCoV2 origins / early spread, and instead focus on following two questions: (1) How can we get more data? (2) How can we better analyze the data we have? (24/n)Afficher cette discussion -
Finally, my analysis is on GitHub at https://github.com/jbloom/SARS-CoV-2_PRJNA612766 … where you can access all code, data, & paper drafts. All updates are via time-stamped commits. This ensures transparency/reproducibility of this study are not in doubt, regardless of your views on interpretation. (25/n)
Afficher cette discussion -
Afficher cette discussion
Fin de la conversation
Nouvelle conversation -
Le chargement semble prendre du temps.
Twitter est peut-être en surcapacité ou rencontre momentanément un incident. Réessayez ou rendez-vous sur la page Twitter Status pour plus d'informations.