高维线性回归中相关噪声的良性过拟合问题 (Benign Overfitting of Non-Sparse High-Dimensional Linear Regression with Correlated Noise)

We investigate the high-dimensional linear regression problem in situations where there is noise correlated with Gaussian covariates. In regression models, the phenomenon of the correlated noise is called endogeneity, which is due to unobserved variables and others, and has been a major problem setting in causal inference and econometrics. When the covariates are high-dimensional, it has been common to assume sparsity on the true parameters and estimate them using regularization, even with the endogeneity. However, when sparsity does not hold, it has not been well understood to control the endogeneity and high dimensionality simultaneously. In this paper, we demonstrate that an estimator without regularization can achieve consistency, i.e., benign overfitting, under certain assumptions on the covariance matrix. Specifically, we show that the error of this estimator converges to zero when covariance matrices of the correlated noise and instrumental variables satisfy a condition on their eigenvalues. We consider several extensions to relax these conditions and conduct experiments to support our theoretical findings. As a technical contribution, we utilize the convex Gaussian minimax theorem (CGMT) in our dual problem and extend the CGMT itself.

翻译：我们研究了有高斯协变量相关噪声的高维线性回归问题。在回归模型中，相关噪声现象被称为内源性，这是由于未观察到的变量和其他变量引起的，并且一直是因果推断和计量经济学领域的主要问题设置。当协变量高维时，常常假定真实参数具有稀疏性，并使用正则化方法来估计它们，即使存在内源性也是如此。然而，当稀疏性不成立时，同时控制内源性和高维度并不被很好地理解。在本文中，我们证明在协方差矩阵满足一定条件的情况下，无需正则化的估计器可以实现一致性，即良性过拟合。具体地，我们表明，对于相关噪声和工具变量的协方差矩阵满足特定的特征值条件时，该估计器的误差会收敛于零。我们考虑了一些扩展来放宽这些条件，并进行实验来支持我们的理论结果。作为技术贡献，我们在我们的二元问题中利用了凸高斯极小化定理（CGMT）并扩展了CGMT本身。