具有高尺寸等距的动态主要子空间 (Dynamic Principal Subspaces with Sparsity in High Dimensions)

Principal component analysis (PCA) is a versatile tool to reduce the dimensionality which has wide applications in statistics and machine learning community. It is particularly useful to model data in high-dimensional scenarios where the number of variables $p$ is comparable to, or much larger than the sample size $n$. Despite extensive literature on this topic, researches have focused on modeling static principal eigenvectors or subspaces, which is unsuitable for stochastic processes that are dynamic in nature. To characterize the change in the whole course of high-dimensional data collection, we propose a unified framework to estimate dynamic principal subspaces spanned by leading eigenvectors of covariance matrices. In the proposed framework, we formulate an optimization problem by combining the kernel smoothing and regularization penalty together with the orthogonality constraint, which can be effectively solved by the proximal gradient method for manifold optimization. We show that our method is suitable for high-dimensional data observed under both common and irregular designs. In addition, theoretical properties of the estimators are investigated under $l_q (0 \leq q \leq 1)$ sparsity. Extensive experiments demonstrate the effectiveness of the proposed method in both simulated and real data examples.

翻译：主要组成部分分析(PCA)是减少在统计和机器学习界中广泛应用的维度的多用途工具,在变量数量与抽样规模相当或大大大于美元的高维假设情况下,特别有用,在高维假设情景中模拟数据,因为变量数与变量数相当,或比样本规模大得多。尽管有关于这个专题的大量文献,但研究的重点放在建模静态主源源数或子空间上,这不适合动态的随机过程。为了说明整个高维数据收集过程的变化特点,我们提议了一个统一框架,用以估计由变量矩阵中主要导体跨越的主要动态次空间。在拟议框架中,我们通过将内核平滑和正规化罚款与孔度限制结合起来来形成优化问题,而这种限制可以通过极性梯度梯度方法有效地解决,我们证明我们的方法适合在普通和不规则设计下观测的高维数据。此外,在$l_q (0) /leq=leq=deplicalality explain exactivicality exislations.

相关内容