The COVID-19 pandemic has initiated an unprecedented worldwide effort to characterize its evolution through the mapping of mutations of the coronavirus SARS-CoV-2. The early identification of mutations that could confer adaptive advantages to the virus, such as higher infectivity or immune evasion, is of paramount importance. However, the large number of currently available genomes precludes the efficient use of phylogeny-based methods. Here we establish a fast and scalable early warning system based on Topological Data Analysis for the identification and surveillance of emerging adaptive mutations in large genomic datasets. Analyzing millions of SARS-CoV-2 genomes from GISAID, we demonstrate that topologically salient mutations are linked with an increase in infectivity or immune escape. We report on emerging potentially adaptive mutations as of January 2022, and pinpoint mutations in Variants of Concern that are likely due to convergent evolution. Our approach can improve the surveillance of mutations of concern, guide experimental studies, and aid vaccine development.
翻译:COVID-19大流行已经发动了前所未有的世界性努力,通过绘制SARS-COV-2冠状病毒变异图来描述其演变特点。早期发现能够给病毒带来适应优势的突变,例如感染率或免疫逃逸率较高,至关重要。然而,由于现有大量基因组,无法有效利用基于植物遗传学的方法。我们在此建立了基于地形数据分析的快速和可扩缩的预警系统,用以识别和监测大型基因组中新出现的适应性突变。分析来自GISAID的数百万SARS-COV-2基因组,我们证明,表面显著的突变与感染性或免疫逃逸性增加有关。我们报告2022年1月新出现的潜在适应性突变情况,并指明可能因趋同演变而产生的关切变异情况。我们的方法可以改进对引起关注的变异现象的监测,指导实验研究和帮助疫苗开发。