A tree-based model for addressing sparsity and taxa covariance in microbiome compositional count data

Microbiome compositional data are often high-dimensional, sparse, and exhibit pervasive cross-sample heterogeneity. Generative modeling is a popular approach to analyze such data, and effective generative models must accurately characterize these key features. While high-dimensionality and abundance of zeros have received much attention, existing models often lack flexibility in capturing complex cross-sample variability. This limitation can affect statistical efficiency and lead to misleading conclusions in tasks like differential abundance analysis, clustering, and network analysis. We introduce a generative model, the "logistic-tree normal" (LTN) model, which addresses this issue and effectively captures key characteristics of microbiome data, including abundance of zeros. LTN employs a tree-based decomposition to aggregate sparse taxa counts and uses a (multivariate) logistic-normal distribution at tree splits, allowing for flexible covariance adjustments among taxa as needed. The latent Gaussian structure of LTN enables the incorporation of multivariate analysis tools that enforce sparsity or low-rank covariance assumptions. As a versatile, fully generative model, LTN supports a wide range of applications and offers efficient Bayesian inference computational recipes through conjugate blocked Gibbs sampling with P\'olya-Gamma augmentation. We demonstrate application of LTN in a compositional mixed-effects model for differential abundance analysis using numerical experiments and a reanalysis of the infant cohort in the DIABIMMUNE study. Our findings illustrate that LTN, by adequately accounting for cross-sample heterogeneity, appropriately generates the proportion of zeros without requiring an explicit zero-inflation component, confirming a recent viewpoint that "zero-inflation" in count-based sequencing data are often results of unaccounted cross-sample variation.

翻译：暂无翻译

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/