Inferring who infected whom in an outbreak is essential for characterising transmission dynamics and guiding public health interventions. However, this task is challenging due to limited surveillance data and the complexity of immunological and social interactions. Instead of a single definitive transmission tree, epidemiologists often consider multiple plausible trees forming \textit{epidemic forests}. Various inference methods and assumptions can yield different epidemic forests, yet no formal test exists to assess whether these differences are statistically significant. We propose such a framework using a chi-square test and permutational multivariate analysis of variance (PERMANOVA). We assessed each method's ability to distinguish simulated epidemic forests generated under different offspring distributions. While both methods achieved perfect specificity for forests with 100+ trees, PERMANOVA consistently outperformed the chi-square test in sensitivity across all epidemic and forest sizes. Implemented in the R package \textit{mixtree}, we provide the first statistical framework to robustly compare epidemic forests.
翻译:推断疫情中谁感染了谁对于刻画传播动态和指导公共卫生干预至关重要。然而,由于监测数据有限以及免疫与社会交互的复杂性,这一任务极具挑战性。流行病学家通常考虑多个可能的传播树构成的流行森林,而非单一确定的传播树。不同的推断方法和假设可能产生不同的流行森林,但目前尚无正式检验方法来评估这些差异是否具有统计显著性。我们提出了一种使用卡方检验和置换多元方差分析(PERMANOVA)的框架。我们评估了每种方法在区分不同后代分布下生成的模拟流行森林时的能力。虽然两种方法在森林包含100棵以上树时均达到完美的特异性,但PERMANOVA在所有疫情规模和森林大小下,其敏感性均一致优于卡方检验。该框架已在R包mixtree中实现,我们提供了首个能够稳健比较流行森林的统计框架。