We analyze principal component regression (PCR) in a high-dimensional error-in-variables setting with fixed design. Under suitable conditions, we show that PCR consistently identifies the unique model with minimum $\ell_2$-norm and is near minimax optimal. These results enable us to establish non-asymptotic out-of-sample prediction guarantees that improve upon the best known rates. In our analysis, we introduce a natural linear algebraic condition between the in- and out-of-sample covariates, which allows us to avoid distributional assumptions. Our simulations illustrate the importance of this condition for generalization, even under covariate shifts. As a byproduct, our results also lead to novel results for the synthetic controls literature, a leading approach for policy evaluation. In particular, our minimax results suggest the attractiveness of PCR based methods amongst the numerous variants. To the best of our knowledge, our prediction guarantees for the fixed design setting have been elusive in both the high-dimensional error-in-variables and synthetic controls literatures.
翻译:我们用固定设计的高维误差变量设置分析主要元件回归(PCR) 。 在适当条件下, 我们显示 PCR 始终以最小的 $\ ell_ 2$- 诺尔姆来辨别独特的模型, 并且接近最低最佳的微量。 这些结果使我们能够建立非无症状的抽样外预测保证, 提高最已知的速率 。 在分析中, 我们引入了一种天然线性线性代数条件, 允许我们避免分布性假设。 我们的模拟显示这一条件对于一般化的重要性, 甚至在交替变换中。 作为副产品, 我们的结果还导致合成控制文献的新结果, 这是一种政策评估的主要方法。 特别是, 我们的微量模型结果表明, 以PCR为基础的方法在众多变量中的吸引力。 据我们所知, 在高维误差和合成控制文献中, 我们对固定设计设置的预测保证是难以实现的。