Discrete data such as counts of microbiome taxa resulting from next-generation sequencing are routinely encountered in bioinformatics. Taxa count data in microbiome studies are typically high-dimensional, over-dispersed, and can only reveal relative abundance therefore being treated as compositional. Analyzing compositional data presents many challenges because they are restricted on a simplex. In a logistic normal multinomial model, the relative abundance is mapped from a simplex to a latent variable that exists on the real Euclidean space using the additive log-ratio transformation. While a logistic normal multinomial approach brings in flexibility for modeling the data, it comes with a heavy computational cost as the parameter estimation typically relies on Bayesian techniques. In this paper, we develop a novel mixture of logistic normal multinomial models for clustering microbiome data. Additionally, we utilize an efficient framework for parameter estimation using variational Gaussian approximations (VGA). Adopting a variational Gaussian approximation for the posterior of the latent variable reduces the computational overhead substantially. The proposed method is illustrated on simulated and real datasets.
翻译:生物信息学中通常会遇到下一代序列产生的微生物分类计数等分解数据。 微生物研究中的分类计数数据通常具有高维和超分散性,因此只能显示相对丰度,因此被视为构成。 分析合成数据提出了许多挑战,因为它们局限于简单x。 在后勤正常的多分子模型中,相对丰度是从简单的x到使用添加对数转换法在真实的欧clidean空间存在的潜伏变量绘制的。 逻辑正常的逻辑性普通多数值方法为数据建模带来灵活性,但随着参数估计通常依赖巴伊西亚技术而带来沉重的计算成本。 在本文中,我们开发了将微生物数据分组的物流正常多数值模型的新混合物。 此外,我们使用一个高效的参数估算框架,使用变异高比比近比值(VGA)来进行参数估算。对潜在变量的后代位变量采用变比值近值,大大降低了计算成本。在模拟和真实数据上演示了拟议的方法。