Conversation

Hepatitis C virus (left) is my teaching example of a consensus nomenclature based on sequence variation. It benefits from a deep evolutionary history leading to visually distinct clades (confusingly termed "genotypes").
1
9
Thanks to a global sequencing and data sharing effort, we have an unprecedented picture of the evolution of #SARSCoV2 — but the compressed time scale and direct sampling of ancestors means a lack of gaps we can use to define categories.
4
11
I used IQTREE in both cases, with protein sequences for the LANL-curated HCV reference genomes (to get better estimates of internal branch lengths), and nucleotide sequences for SC2. So the scale comparison is VERY crude!
9
In my experience, the virus-specific database (which has been around for months, if we're talking about the same thing) does not contain all available records in the general NCBI database - perhaps because it is manually curated.
1
1
Show replies
Would building trees from data collected within a window of time help, or reduce the data too substantially? If one of the problems is sampling ancestors, could that solve the issue for some trees?
1
1
Interesting take on the problem. I think with time scale of pandemic, many ancestral variants still relevant clinically and epidemiologically. Also have to deal with genomes from old samples with delayed release.
1
2
Show replies