The occurrence of multiple strains of a bacterial pathogen such as M. tuberculosis or C. difficile within a single human host, referred to as a mixed infection, has important implications for both healthcare and public health. However, methods for detecting it, and especially determining the proportion and identities of the underlying strains, from WGS (whole-genome sequencing) data, have been limited. In this paper we introduce SplitStrains, a novel method for addressing these challenges. Grounded in a rigorous statistical model, SplitStrains not only demonstrates superior performance in proportion estimation to other existing methods on both simulated as well as real M. tuberculosis data, but also successfully determines the identity of the underlying strains. We conclude that SplitStrains is a powerful addition to the existing toolkit of analytical methods for data coming from bacterial pathogens and holds the promise of enabling previously inaccessible conclusions to be drawn in the realm of public health microbiology
翻译:M.肺结核或C.杂杂菌等细菌病原体在单一人类宿主内发生多重菌株,被称为混合感染,这对保健和公共卫生都具有重要影响,然而,从WGS(全基因测序)数据中检测该菌株的方法,特别是确定底菌株的比例和特性的方法有限,在本文件中,我们介绍SplitStrains,这是应对这些挑战的一种新颖方法。基于严格的统计模型,SlipStrains不仅显示模拟和真实M.肺结核数据与其他现有方法相比,其估计的优异性,而且还成功地确定了底菌株的特性。我们的结论是,SlipStrains是现有分析方法工具箱中来自细菌病原体的数据的有力补充,并有望在公共卫生微生物学领域得出以前无法获取的结论。