We study instrumental variable regression in data rich environments. The goal is to estimate a linear model from many noisy covariates and many noisy instruments. Our key assumption is that true covariates and true instruments are repetitive, though possibly different in nature; they each reflect a few underlying factors, however those underlying factors may be misaligned. We analyze a family of estimators based on two stage least squares with spectral regularization: canonical correlations between covariates and instruments are learned in the first stage, which are used as regressors in the second stage. As a theoretical contribution, we derive upper and lower bounds on estimation error, proving optimality of the method with noisy data. As a practical contribution, we provide guidance on which types of spectral regularization to use in different regimes.
翻译:本研究探讨数据丰富环境下的工具变量回归问题。目标是从大量带噪声的协变量和大量带噪声的工具变量中估计线性模型。我们的核心假设是:真实协变量与真实工具变量具有重复性特征,尽管二者性质可能不同——它们各自反映少量潜在因子,但这些潜在因子可能存在错位现象。我们分析了一类基于谱正则化两阶段最小二乘的估计器:第一阶段学习协变量与工具变量间的典型相关性,第二阶段将其作为回归因子。理论贡献方面,我们推导了估计误差的上下界,证明了该方法在带噪声数据条件下的最优性。实践贡献方面,我们针对不同数据特征提出了谱正则化方法的选择指导。