Methods for supervised principal component analysis (SPCA) aim to incorporate label information into principal component analysis (PCA), so that the extracted features are more useful for a prediction task of interest. Prior work on SPCA has focused primarily on optimizing prediction error, and has neglected the value of maximizing variance explained by the extracted features. We propose a new method for SPCA that addresses both of these objectives jointly, and demonstrate empirically that our approach dominates existing approaches, i.e., outperforms them with respect to both prediction error and variation explained. Our approach accommodates arbitrary supervised learning losses and, through a statistical reformulation, provides a novel low-rank extension of generalized linear models.
翻译:监督主要组成部分分析方法(SPCA)旨在将标签信息纳入主要组成部分分析,使提取的特征更有助于进行有意义的预测工作;以前关于SPCA的工作主要侧重于优化预测误差,忽视了因提取的特征而导致差异最大化的价值;我们为SPCA提出了一个新的方法,共同处理这两个目标,并从经验上表明,我们的方法优于现有方法,即在预测误差和解释的变异方面优于现有方法;我们的方法考虑到任意监督的学习损失,并通过重拟统计,为通用线性模型提供了新的低级扩展。