We introduce a probabilistic model, called the "logistic-tree normal" (LTN), for microbiome compositional data. The LTN marries two popular classes of models -- the logistic-normal (LN) and the Dirichlet-tree (DT) -- and inherits the key benefits of both. LN models are flexible in characterizing rich covariance structure among taxa but can be computationally prohibitive in face of high dimensionality (i.e., when the number of taxa is large) due to its lack of conjugacy to the multinomial sampling model. On the other hand, DT avoids this issue by decomposing the multinomial sampling model into a collection of binomials, one at each split of the phylogenetic tree of the taxa, and adopting a conjugate beta model for each binomial probability, but at the same time the DT incurs restrictive covariance among the taxa. In contrast, the LTN model decomposes the multinomial model into binomials as the DT does, but it jointly models the corresponding binomial probabilities using a (multivariate) LN distribution instead of betas. It therefore allows rich covariance structures as the LN models, while the decomposition of the multinomial likelihood allows conjugacy to be restored through the P\'olya-Gamma augmentation. Accordingly, Bayesian inference on the LTN model can readily proceed by Gibbs sampling. Moreover, the multivariate Gaussian aspect of the model allows common techniques for effective inference on high-dimensional data -- such as those based on sparsity and low-rank assumptions in the covariance structure -- to be readily incorporated. Depending on the goal of the analysis, the LTN model can be used either as a standalone model or embedded into more sophisticated models. We demonstrate its use in estimating taxa covariance and in mixed-effects modeling. Finally, we carry out a case study using an LTN-based mixed-effects model to analyze a longitudinal dataset from the DIABIMMUNE project.
翻译:我们引入一种概率模型, 称为“ 逻辑树正常” (LTN), 用于微生物构成数据。 LTN 将两个受欢迎的模型类别 -- -- 后勤正常( LN) 和 Dirichlet- tree (DT) -- -- 并继承这两种模型的主要好处。 LN 模型在给分类中富多变结构的特征上具有灵活性,但在面临高维度( 即, 当税级数量很大时), 计算起来会令人难以接受。 相比之下, LTN 模型将多数值模型与多数值采样模型混在一起。 另一方面, DT 将多数值采样模型分解为两种受欢迎的模型 -- -- 后勤正常( LNN) 将多数值采样模型分解为双数, 在分类树分立树的每个树上都有一个分选, 并且对于每个二元概率概率概率概率的概率模型可以产生限制性的变数模型。 它可以在税级模型中将多数值模型分化成一个双数,, 在数据模型中, 将多数值模型作为数据模型在 IMDI 的模型中,,, 将比喻的模型在模型里基数据化一个基结构中,,, 将一个比值模型可以使 数据化为自动变化一个基 数据 数据 数据 数据化为 。