Dimension reduction for high-dimensional compositional data plays an important role in many fields, where the principal component analysis of the basis covariance matrix is of scientific interest. In practice, however, the basis variables are latent and rarely observed, and standard techniques of principal component analysis are inadequate for compositional data because of the simplex constraint. To address the challenging problem, we relate the principal subspace of the centered log-ratio compositional covariance to that of the basis covariance, and prove that the latter is approximately identifiable with the diverging dimensionality under some subspace sparsity assumption. The interesting blessing-of-dimensionality phenomenon enables us to propose the principal subspace estimation methods by using the sample centered log-ratio covariance. We also derive nonasymptotic error bounds for the subspace estimators, which exhibits a tradeoff between identification and estimation. Moreover, we develop efficient proximal alternating direction method of multipliers algorithms to solve the nonconvex and nonsmooth optimization problems. Simulation results demonstrate that the proposed methods perform as well as the oracle methods with known basis. Their usefulness is illustrated through an analysis of word usage pattern for statisticians.
翻译:在许多领域,对基础共变矩阵的主要组成部分分析具有科学意义。但是,在实践中,基础变量是潜在的,很少观测,主要组成部分分析的标准技术由于简单x的制约,对组成数据来说是不够的。为了解决具有挑战性的问题,我们将圆对数的正对数构成共差的主要次空间与基础共变空间的次空间联系起来,并证明后者与某些次空间波幅假设下的不同维度大致相容。引人注意的维度现象使我们得以通过使用抽样的正对数正对数共变法提出主要的次空间估计方法。我们还为亚空间估计者得出非适值的误差界限,这些误差显示识别和估计之间的偏差。此外,我们开发了高效的倍数算法交替方向方法,以解决非对流和不移动的优化问题。模拟结果表明,拟议的方法作为已知的正值方法,其用途是通过一个单词分析来说明的统计师的使用情况。