Scientific studies conducted during the last two decades have established the central role of the microbiome in disease and health. Differential abundance analysis aims to identify microbial taxa associated with two or more sample groups defined by attributes such as disease subtype, geography, or environmental condition. The results, in turn, help clinical practitioners and researchers diagnose disease and develop new treatments more effectively. However, detecting differential abundance is uniquely challenging due to the high dimensionality, collinearity, sparsity, and compositionality of microbiome data. Further, there is a critical need for unified statistical approaches that can directly compare more than two groups and appropriately adjust for covariates. We develop a zero-inflated Bayesian nonparametric (ZIBNP) methodology that meets the multipronged challenges posed by microbiome data and identifies differentially abundant taxa in two or more groups, while also accounting for sample-specific covariates. The proposed hierarchical model flexibly adapts to unique data characteristics, casts the typically high proportion of zeros in a censoring framework, and mitigates high dimensionality and collinearity issues by utilizing the dimension reducing property of the semiparametric Chinese restaurant process. The approach relates the microbiome sampling depths to inferential precision and conforms with the compositional nature of microbiome data. In simulation studies and in the analyses of the CAnine Microbiome during Parasitism (CAMP) data on infected and uninfected dogs, and the Global Gut microbiome data on human subjects belonging to three geographical regions, we compare ZIBNP with established statistical methods for differential abundance analysis in the presence of covariates.
翻译:在过去20年中开展的科学研究确定了微生物在疾病和健康方面的中心作用; 差异丰度分析旨在确定与疾病亚型、地理或环境条件等属性界定的两个或更多样本组有关的微生物群; 其结果反过来帮助临床从业者和研究人员诊断疾病,并更有效地开发新的治疗方法; 然而,由于微生物数据的高度、相近性、宽度和构成性,发现差异丰度具有独特的挑战性; 此外,迫切需要采用统一的统计量方法,直接比较两个以上组,并适当调整共变情况; 我们开发了一种零膨胀的巴伊西亚非参数类非参数组(ZIBNP)方法,该方法满足了微生物数据构成的多管齐下挑战,并查明了两个或两个或两个以上组的丰富性类别,同时还计入了具体样本变异性; 拟议的等级模型灵活地适应了独特的数据特性,在审查框架中将零的典型比例放在审查框架中,并且通过利用低比亚基基质的地理差异特性来减轻高度和共度问题。