Principal component regression is a popular method to use when the predictor matrix in a regression is of reduced column rank. It has been proposed to stabilize computation under such conditions, and to improve prediction accuracy by reducing variance of the least squares estimator for the regression slopes. However, it presents the added difficulty of having to determine which principal components to include in the regression. I provide arguments against selecting the principal components by the magnitude of their associated eigenvalues, by examining the estimator for the residual variance, and by examining the contribution of the residual variance to the variance of the estimator for the regression slopes. I show that when a principal component is omitted from the regression that is important in explaining the response variable, the residual variance is overestimated, so that the variance of the estimator for the regression slopes can be higher than that of the ordinary least squares estimator.
翻译:主要元件回归是一种常用的方法,用于在回归中的预测矩阵的列级下降时使用。建议在此条件下稳定计算,并通过减少回归坡度最小方形估计值的差异来提高预测的准确性。然而,它表明,更难确定哪些主要元件要包含在回归中。我提出论据,反对根据相关偏差值的大小选择主要元件,要检查剩余差的估计值,要检查剩余差对回归坡度估计值差异的影响。我表明,如果从解释响应变量的重要回归中省略一个主要元件,那么剩余差就会被高估,这样,回归坡点估计值的估算值的偏差会高于普通最小方形估计值的偏差。