In stopping the spread of infectious diseases, pathogen genomic data can be used to reconstruct transmission events and characterize population-level sources of infection. Most approaches for identifying transmission pairs do not account for the time that passed since divergence of pathogen variants in individuals, which is problematic in viruses with high within-host evolutionary rates. This is prompting us to consider possible transmission pairs in terms of phylogenetic data and additional estimates of time since infection derived from clinical biomarkers. We develop Bayesian mixture models with an evolutionary clock as signal component and additional mixed effects or covariate random functions describing the mixing weights to classify potential pairs into likely and unlikely transmission pairs. We demonstrate that although sources cannot be identified at the individual level with certainty, even with the additional data on time elapsed, inferences into the population-level sources of transmission are possible, and more accurate than using only phylogenetic data without time since infection estimates. We apply the approach to estimate age-specific sources of HIV infection in Amsterdam MSM transmission networks between 2010-2021. This study demonstrates that infection time estimates provide informative data to characterize transmission sources, and shows how phylogenetic source attribution can then be done with multi-dimensional mixture models.
翻译:在阻止传染病传播方面,病原体基因组数据可用于重构传播事件和表征感染的人群级来源。大多数鉴定传播对不考虑个体内病原体变异分化之后所经过的时间,这在具有高个体内进化速率的病毒中是有问题的。这促使我们从系统发育数据和来自临床生物标志物的额外感染时间估计的角度考虑可能的传播对。我们开发了具有进化时钟作为信号组件和额外混合效应或协变量随机函数来描述混合权重的贝叶斯混合模型,以将潜在的配对分为可能和不可能的传播对。我们证明,尽管不能确定在个体水平上识别来源,但即使使用时间自感染以来的附加数据,也可能推断群体级传播源,且比仅使用系统发育数据而没有时间自感染以来的估计更准确。我们应用该方法来估算2010-2021年间阿姆斯特丹男男之间传播网络中以年龄为特征的HIV感染来源。该研究表明,感染时间估计提供了有信息量的数据来表征传播来源,并展示了如何使用多维混合模型进行系统发育来源归因。