As the major target of many vaccines and neutralizing antibodies against SARS-CoV-2, the spike (S) protein is observed to mutate over time. In this paper, we present statistical approaches to tackle some challenges associated with the analysis of S-protein data. We build a Bayesian hierarchical model to study the temporal and spatial evolution of S-protein sequences, after grouping the sequences into representative clusters. We then apply sampling methods to investigate possible changes to the S-protein's 3-D structure as a result of commonly observed mutations. While the increasing spread of D614G variants has been noted in other research, our results also show that the co-occurring mutations of D614G together with S477N or A222V may spread even more rapidly, as quantified by our model estimates.
翻译:作为许多疫苗的主要目标,并且使抗体对SARS-COV-2进行中和抗体对抗的主要目的,观察到峰值蛋白质会随着时间推移而变异,在本文中,我们提出了应对S-protein数据分析方面一些挑战的统计方法,我们建立了一个巴伊西亚等级模型,在将S-protein序列分组成具有代表性的组群后,研究S-477N或A-222V序列的时间和空间演变情况,然后我们采用取样方法,调查S-protein3-D结构因常见突变而可能发生的变化。虽然在其他研究中已经注意到D-614G变异物的日益扩散,但我们的结果还表明,按照我们的模型估计,D-614G和S-477N或A-222V同时发生的突变可能更加迅速蔓延。