Sufficient dimension reduction (SDR) methods aim to identify a dimension reduction subspace (DRS) that preserves all the information about the conditional distribution of a response given its predictor. Traditional SDR methods determine the DRS by solving a method-specific generalized eigenvalue problem and selecting the eigenvectors corresponding to the largest eigenvalues. In this article, we argue against the long-standing convention of using eigenvalues as the measure of subspace importance and propose alternative ordering criteria that directly assess the predictive relevance of each subspace. For a binary response, we introduce a subspace ordering criterion based on the absolute value of the independent Student's t-statistic. Theoretically, our criterion identifies subspaces that achieve the local minimum Bayes' error rate and yields consistent ordering of directions under mild regularity conditions. Additionally, we employ an F-statistic to provide a framework that unifies categorical and continuous responses under a single subspace criterion. We evaluate our proposed criteria within multiple SDR methods through extensive simulation studies and applications to real data. Our empirical results demonstrate the efficacy of reordering subspaces using our proposed criteria, which generally improves classification accuracy and subspace estimation compared to ordering by eigenvalues.
翻译:充分降维方法旨在识别一个降维子空间,该子空间能够保留给定预测变量条件下响应变量条件分布的全部信息。传统的充分降维方法通过求解特定方法对应的广义特征值问题,并选取对应最大特征值的特征向量来确定降维子空间。本文针对长期沿用特征值作为子空间重要性度量标准的惯例提出质疑,并提出可直接评估各子空间预测相关性的替代排序准则。针对二值响应变量,我们引入基于独立学生t统计量绝对值的子空间排序准则。理论上,该准则能够识别达到局部最小贝叶斯错误率的子空间,并在温和的正则性条件下实现方向排序的一致性。此外,我们采用F统计量构建了一个统一框架,可将分类变量与连续响应的子空间准则整合于单一标准之下。通过大量模拟研究及实际数据应用,我们在多种充分降维方法中评估了所提出的准则。实证结果表明,采用本文提出的准则对子空间进行重新排序具有显著效能,相较于基于特征值的排序方法,普遍提升了分类精度与子空间估计的准确性。