SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes
IEEE Computer Society and GBC/ACM
online 7:00 PM, Thursday, 14 January 2021
COVID-19 math: gene content, mutations, and clinical trial analysis
Irwin Jungreis and Manolis Kellis
Register in advance for this webinar at https://acm-org.zoom.us/webinar/register/9416072160724/WN_y8wvok4XTM62W0...
After registering, you will receive a confirmation email containing information about joining the webinar.
Despite its overwhelming clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. Here, we use comparative genomics to provide a high-confidence protein-coding gene set, characterize protein-level and nucleotide-level evolutionary constraint, and prioritize functional mutations from the ongoing COVID-19 pandemic. We select 44 complete Sarbecovirus genomes at evolutionary distances ideally-suited for protein-coding and non-coding element identification, create whole-genome alignments, and quantify protein-coding evolutionary signatures and overlapping constraint. We find strong protein-coding signatures for all named genes and for 3a, 6, 7a, 7b, 8, 9b, and also ORF3c, a novel alternate-frame gene. By contrast, ORF10 and overlapping-ORFs 9c lack protein-coding signatures. Furthermore, we show no other conserved protein-coding genes remain to be discovered. Cross-strain and within-strain evolutionary pressures largely agree at the gene, amino-acid, and nucleotide levels, with some notable exceptions, including fewer-than-expected mutations in nsp3 and Spike subunit S1, and more-than-expected mutations in Nucleocapsid. The latter also shows a cluster of amino-acid-changing variants in otherwise-conserved residues in a predicted B-cell epitope, which may indicate positive selection for immune avoidance. Several Spike-protein mutations, including D614G, which has been associated with increased transmission, disrupt otherwise-perfectly-conserved amino acids, and could be novel adaptations to human hosts. The resulting high-confidence gene set and evolutionary-history annotations provide valuable resources and insights on COVID-19 biology, mutations, and evolution.
After obtaining a PhD in pure math from Harvard University, Irwin Jungreis embarked on a 17-year career as a software engineer, manager, executive, and entrepreneur in the Computer Aided Design software industry. Inspired by Darwin's "On the Origin of Species", he decided to leave the software industry behind and become a biologist. While filling the gaps in his biological education, he encountered Manolis Kellis's computational biology class at MIT, where he fell in love with the idea of using the imprint of evolution to reveal the secrets of molecular biology. He joined the Kellis lab as a Research Scientist, where he has worked for the last eleven years, with a focus on using evolutionary signatures to detect novel protein-coding genes and unusual cases of protein translation, including stop codon readthrough.
Manolis Kellis is a Professor of Computer Science at MIT, an Institute Member of the Broad Institute of MIT and Harvard, and a member of the Computer Science and Artificial Intelligence Lab at MIT where he directs the MIT Computational Biology Group (compbio.mit.edu). He has helped direct several large-scale genomics projects, including the NIH Roadmap Epigenomics project, the comparative analysis of 29 mammals, the Encyclopedia of DNA Elements (ENCODE) project, and the Genotype Tissue-Expression (GTEx) project. He received the US Presidential Early Career Award in Science and Engineering (PECASE), the NSF CAREER award, the Alfred P. Sloan Fellowship. He obtained his Ph.D. from MIT, where he received the Sprowls award for the best doctorate thesis in computer science. He lived in Greece and France before moving to the US.
This joint meeting of the Boston Chapter of the IEEE Computer Society and GBC/ACM will be online only due to the COVID-19 lockdown.
Slides for this talk are now available at https://ewh.ieee.org/r1/boston/computer/IrwinSlides1-14-21.pdf.