One fundamental statistical task in microbiome data analysis is differential abundance analysis, which aims to identify microbial taxa whose abundance covaries with a variable of interest. Although the main interest is on the change in the absolute abundance, i.e., the number of microbial cells per unit area/volume at the ecological site such as the human gut, the data from a sequencing experiment reflects only the taxa relative abundances in a sample. Thus, microbiome data are compositional in nature. Analysis of such compositional data is challenging since the change in the absolute abundance of one taxon will lead to changes in the relative abundances of other taxa, making false positive control difficult. Here we present a simple, yet robust and highly scalable approach to tackle the compositional effects in differential abundance analysis. The method only requires the application of established statistical tools. It fits linear regression models on the centered log-ratio transformed data, identifies a bias term due to the transformation and compositional effect, and corrects the bias using the mode of the regression coefficients. Due to the algorithmic simplicity, our method is 100-1000 times faster than the state-of-the-art method ANCOM-BC. Under mild assumptions, we prove its asymptotic FDR control property, making it the first differential abundance method that enjoys a theoretical FDR control guarantee. The proposed method is very flexible and can be extended to mixed-effect models for the analysis of correlated microbiome data. Using comprehensive simulations and real data applications, we demonstrate that our method has overall the best performance in terms of FDR control and power among the competitors. We implemented the proposed method in the R package LinDA (https://github.com/zhouhj1994/LinDA).
翻译:微生物数据分析的一项基本统计任务就是对微生物群进行分析,目的是确定微生物群的丰度和兴趣可变的丰度。虽然主要兴趣在于绝对丰度的变化,即每个单位区/体积的微生物细胞数量,例如人类肠胃等生态地点的微生物体细胞数量,但测序实验中的数据只反映一个样本中的分类相对丰度。因此,微生物数据具有构成性质。这种组成数据的分析具有挑战性,因为一个税项绝对丰度的变化将导致其他税项相对丰度的变化,造成错误的积极控制困难。这里我们提出了一个简单、有力和高度可扩展的方法,以解决差异丰度分析中的成份效应。这种方法只要求应用既定的统计工具。它符合在拟议的日志拉皮数据转换数据中的线性回归模型,确定因变异和构成效应而产生的偏差术语,并用弹性系数来纠正偏差。由于算简单化,我们的方法比其他税项的相对丰度变化量增加100-1000倍,使得错误的正值控制困难。在这里,我们提出了一个简单、稳重的模型分析方法,即AMA-DR AS-DR AS AS AS IM AS AS AS AS AS AS AS AS ASU ASU ASU ASU ASU ASU AS ASU ASU ASU ASU ASU ASU ASU AS AS ASU AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS MA AS AS AS AS AS AS AS MA AS AS AS AS MA MA AS AS AS AS AS MA MA MA AS MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MA MAL MA MA MA AS MA MA MA MA MA MA MA MA MA MA MA AS MA MA MA MA AS AS AS MA MA AS AS MA AS MA MA MA MA MA MA MA