含有组成数据的直接共变矩阵估算 (Direct covariance matrix estimation with compositional data)

Compositional data arise in many areas of research in the natural and biomedical sciences. One prominent example is in the study of the human gut microbiome, where one can measure the relative abundance of many distinct microorganisms in a subject's gut. Often, practitioners are interested in learning how the dependencies between microbes vary across distinct populations or experimental conditions. In statistical terms, the goal is to estimate a covariance matrix for the (latent) log-abundances of the microbes in each of the populations. However, the compositional nature of the data prevents the use of standard estimators for these covariance matrices. In this article, we propose an estimator of multiple covariance matrices which allows for information sharing across distinct populations of samples. Compared to some existing estimators, which estimate the covariance matrices of interest indirectly, our estimator is direct, ensures positive definiteness, and is the solution to a convex optimization problem. We compute our estimator using a proximal-proximal gradient descent algorithm. Asymptotic properties of our estimator reveal that it can perform well in high-dimensional settings. Through simulation studies, we demonstrate that our estimator can outperform existing estimators. We show that our method provides more reliable estimates than competitors in an analysis of microbiome data from subjects with chronic fatigue syndrome.

翻译：自然科学和生物医学的许多研究领域都产生了构成数据。一个突出的例子是人类肠道微生物研究,可以测量不同微生物在某一对象的肠胃中的相对丰度。从业者往往有兴趣了解微生物在不同的人群或实验条件下的依存性如何不同。从统计角度讲,目标是估计每个人群中微粒(相对的)对数的共变矩阵。然而,数据的组成性质使得无法使用标准的测算器来测量这些变量矩阵。在本篇文章中,我们建议了多个变量矩阵的估测器,以便能够在不同的样本群中分享信息。与某些现有的估测器相比,这些估算器间接地估计了不同人群之间的依存性矩阵,我们的估测器是直接的,确保了正的确定性,并且是同系最优化问题的解决方案。我们用一种准的测算器测算器来计算这些变量的梯度下位矩阵。我们用多个变量测算器的测算器的测度特性可以使不同样本群体之间的信息共享。比我们估算器的测算器的测算法更能显示我们目前的测算器的测算结果。我们目前的测算器的测算器的测算方法更能显示我们目前的测算结果的精确性。我们目前的测算器的测算器能显示我们测算出我们测算出我们测算的测算的测算器的测算的测算的测算的测算方法,我们测算的测算的比测算的测算器的测算器能显示我们测算器比测算器能的测算的测算器能的测算结果。