信封和主要组成部分回归 (Envelopes and principal component regression)

Envelope methods offer targeted dimension reduction for various models. The overarching goal is to improve efficiency in multivariate parameter estimation by projecting the data onto a lower-dimensional subspace known as the envelope. Envelope approaches have advantages in analyzing data with highly correlated variables, but their iterative Grassmannian optimization algorithms do not scale very well with ultra high-dimensional data. While the connections between envelopes and partial least squares in multivariate linear regression have promoted recent progress in high-dimensional studies of envelopes, we propose a more straightforward way of envelope modeling from a novel principal components regression perspective. The proposed procedure, Non-Iterative Envelope Component Estimation (NIECE), has excellent computational advantages over the iterative Grassmannian optimization alternatives in high dimensions. We develop a unified NIECE theory that bridges the gap between envelope methods and principal components in regression. The new theoretical insights also shed light on the envelope subspace estimation error as a function of eigenvalue gaps of two symmetric positive definite matrices used in envelope modeling. We apply the new theory and algorithm to several envelope models, including response and predictor reduction in multivariate linear models, logistic regression, and Cox proportional hazard model. Simulations and illustrative data analysis show the potential for NIECE to improve standard methods in linear and generalized linear models significantly.

翻译：信封方法为各种模型提供了有针对性的维度削减。总体目标是通过将数据投射到一个称为信封的低维子空间,提高多变量参数估算的效率。信封方法在分析数据时具有与高度相关变量的优势, 但其迭接的格拉斯曼优化算法与超高维数据相比规模不甚大。虽然信封与多变量线性回归中部分最小方形之间的连接促进了信封高维度研究的近期进展,但我们建议了一种更直截了当的方式,从新颖的主要组成部分回归角度来进行信封建模型。拟议的程序“非动态信封构组成部分估计(NIECE)”在高维度迭代格拉斯曼优化替代方法上具有极好的计算优势。我们开发了一套统一的NIECE理论,该理论弥合信封方法与超高维度回归中主要组成部分之间的差距。新的理论还揭示了信封子空间估计误差的功能,即信封内两个对称正肯定的矩阵模型的功能。我们将新的理论和算法应用于数个信封封型模型,, 包括多变式模型和预测模型, 和预测器级线性线性模型, 以模拟模型和直线性分析中, 的模型和直线性模型显示的精确性分析。