We investigate the high-dimensional linear regression problem in situations where there is noise correlated with Gaussian covariates. In regression models, the phenomenon of the correlated noise is called endogeneity, which is due to unobserved variables and others, and has been a major problem setting in causal inference and econometrics. When the covariates are high-dimensional, it has been common to assume sparsity on the true parameters and estimate them using regularization, even with the endogeneity. However, when sparsity does not hold, it has not been well understood to control the endogeneity and high dimensionality simultaneously. In this paper, we demonstrate that an estimator without regularization can achieve consistency, i.e., benign overfitting, under certain assumptions on the covariance matrix. Specifically, we show that the error of this estimator converges to zero when covariance matrices of the correlated noise and instrumental variables satisfy a condition on their eigenvalues. We consider several extensions to relax these conditions and conduct experiments to support our theoretical findings. As a technical contribution, we utilize the convex Gaussian minimax theorem (CGMT) in our dual problem and extend the CGMT itself.
翻译:我们研究了在存在与高斯协变量相关的噪声的情况下的高维线性回归问题。在回归模型中,相关噪声的现象被称为外生性,这是由于未观察到的变量和其他变量的影响,这在因果推断和计量经济学中一直是一个主要的问题设置。当协变量维度很高时,通常假设只有少部分的真实参数,使用正则化估计这些参数,即使在外生性的情况下也是如此。然而,当真实参数不具备稀疏性时,目前还没有很好的方法来同时控制外生性和高维度。在本文中,我们证明了在协方差矩阵满足一定特定特征值条件的情况下,无正则化的估计器可以实现一致性,即良性过拟合。我们考虑了几种扩展来放宽这些条件,并进行了实验以支持我们的理论发现。作为技术贡献,我们在我们的二元问题中利用了凸形高斯极小化定理(CGMT)并扩展了CGMT本身。