The human microbiome plays an important role in human health and disease status. Next generating sequencing technologies allow for quantifying the composition of the human microbiome. Clustering these microbiome data can provide valuable information by identifying underlying patterns across samples. Recently, Fang and Subedi (2020) proposed a logistic normal multinomial mixture model (LNM-MM) for clustering microbiome data. As microbiome data tends to be high dimensional, here, we develop a family of logistic normal multinomial factor analyzers (LNM-FA) by incorporating a factor analyzer structure in the LNM-MM. This family of models is more suitable for high-dimensional data as the number of parameters in LNM-FA can be greatly reduced by assuming that the number of latent factors is small. Parameter estimation is done using a computationally efficient variant of the alternating expectation conditional maximization algorithm that utilizes variational Gaussian approximations. The proposed method is illustrated using simulated and real datasets.
翻译:人类微生物在人类健康和疾病状况中起着重要作用。 下一个生成序列技术可以量化人类微生物的构成。 将这些微生物数据组合起来可以通过辨别各种样本的基本模式提供有价值的信息。 最近, Fang 和 Subedi (2020年) 提议了一个后勤正常的多元混合模型(LNM-MM) 来组集微生物数据。 由于微生物数据往往是高维的, 我们这里通过在 LNM- MM 中加入一个要素分析器结构来形成一个后勤正常的多数值分析器(LNM-FA)的组合。 这种模型组合更适合高维数据, 因为假设LNM- FA 中的参数数量很小,因此可以大大减少这些参数的数量。 参数估算参数是使用一种计算高效的、 交替的、 条件最大化算法的变式, 使用变式标的近法。 使用模拟的和真实的数据集演示了拟议的方法。