Microbial communities are widely studied using high-throughput sequencing techniques, such as 16S rRNA gene sequencing. These techniques have attracted biologists as they offer powerful tools to explore microbial communities and investigate their patterns of diversity in biological and biomedical samples at remarkable resolution. However, the accuracy of these methods can negatively affected by the presence of contamination. Several studies have recognized that contamination is a common problem in microbial studies and have offered promising computational and laboratory-based approaches to assess and remove contaminants. Here we propose a novel strategy, MI-based (mutual information based) filtering method, which uses information theoretic functionals and graph theory to identify and remove contaminants. We applied MI-based filtering method to a mock community data set and evaluated the amount of information loss due to filtering taxa. We also compared our method to commonly practice traditional filtering methods. In a mock community data set, MI-based filtering approach maintained the true bacteria in the community without significant loss of information. Our results indicate that MI-based filtering method effectively identifies and removes contaminants in microbial communities and hence it can be beneficial as a filtering method to microbiome studies. We believe our filtering method has two advantages over traditional filtering methods. First, it does not required an arbitrary choice of threshold and second, it is able to detect true taxa with low abundance.
翻译:利用高通量测序技术,如16S RRNA基因测序等,对微生物社区进行了广泛研究,这些技术吸引生物学家,因为它们为探索微生物社区以及调查生物和生物医学样本中多样性模式提供了强有力的工具;然而,这些方法的准确性可能因污染的存在而受到不利影响; 几项研究已经认识到污染是微生物研究中常见的问题,并提供了有希望的计算和实验室方法来评估和清除污染物。 我们在这里提出了一个新颖的战略,即基于MI(基于相互信息的)过滤法,利用信息理论功能和图表理论来查明和消除污染物。 我们用基于MI的过滤法来模拟社区数据集,并评估由于过滤税税税而导致的信息损失数量。 我们还将我们的方法与通常采用的传统过滤法进行比较。 在模拟社区数据集中,基于MI的过滤法在不造成重大信息损失的情况下维持了社区的真正细菌。 我们的结果表明,基于MI的过滤法有效地查明并消除了微生物社区的污染物,因此,它可以作为过滤法的有益之处。 我们用基于MI的过滤法对过滤法进行真正的过滤法进行真正的过滤法,我们认为, 需要一种传统的筛选法的检验方法。