Modern microbiome compositional data are often high-dimensional and exhibit complex dependency among the microbial taxa. However, existing statistical models for such data either do not adequately account for the dependency among the microbial taxa or lack computational scalability with respect to the number of taxa. This presents challenges in important applications such as association analysis between microbiome compositions and disease risk in which valid statistical analysis requires appropriately incorporating the "variance components" or "random effects" in the microbiome composition. We introduce a generative model, called the "logistic-tree normal" (LTN) model, that addresses this need. LTN marries two popular classes of models-namely, log-ratio normal (LN) and Dirichlet-tree (DT)-and inherits the key benefits of each. LTN incorporates the tree-based binomial decomposition as the DT does, but it jointly models the corresponding binomial probabilities using a (multivariate) logistic-normal distribution as in LN models. It therefore allows rich covariance structures as LN, along with computational efficiency realized through a Polya-Gamma augmentation on the binomial models associated with the tree splits. Accordingly, Bayesian inference on LTN can readily proceed by Gibbs sampling. LTN also allows common techniques for effective inference on high-dimensional data to be readily incorporated. We construct a general mixed-effects model using LTN to characterize compositional random effects, which allows flexible taxa covariance. We demonstrate its use in testing association between microbiome composition and disease risk as well as in estimating the covariance among taxa. We carry out an extensive case study using this LTN-enriched compositional mixed-effects model to analyze a longitudinal dataset from the T1D cohort of the DIABIMMUNE project.
翻译:现代微生物构成数据往往是高度的,在微生物分类中表现出复杂的依赖性。然而,这些数据的现有统计模型要么没有充分说明微生物分类中的依赖性,要么没有适当说明微生物分类中的依赖性,或者缺乏分类数量方面的计算性。这在微生物组成和疾病风险之间的关联分析等重要应用方面提出了挑战,在这种应用中,有效的统计分析需要适当纳入微生物组成中的“变异成分”或“随机效应”。我们引入了一种归正模型,称为“逻辑-树正常”模型(LTN),满足了这一需要。LTN结合了两种受欢迎的模型类别,即逻辑-拉皮奥正常(LN)和dirichlet-tree(DTT), 并继承了每种模型的关键效益。LTN结合了基于树的分流的分流变变变变变,但是它用模型(多变的)物流正常分布。因此,我们允许LN进行丰富的变异结构,同时通过在Oral-ral-ral-ral-ral 构成中进行计算效率,同时在Oral-ral-al-lial-lation imal-leval-lation Creal Creal deal Civation Case deal 中, 数据模型中将一个普通化数据模型中, 数据模型中进行一个可转换成正变变变变变变。