In this paper we study the asymptotics of linear regression in settings with non-Gaussian covariates where the covariates exhibit a linear dependency structure, departing from the standard assumption of independence. We model the covariates using stochastic processes with spatio-temporal covariance and analyze the performance of ridge regression in the high-dimensional proportional regime, where the number of samples and feature dimensions grow proportionally. A Gaussian universality theorem is proven, demonstrating that the asymptotics are invariant under replacing the non-Gaussian covariates with Gaussian vectors preserving mean and covariance, for which tools from random matrix theory can be used to derive precise characterizations of the estimation error. The estimation error is characterized by a fixed-point equation involving the spectral properties of the spatio-temporal covariance matrices, enabling efficient computation. We then study optimal regularization, overparameterization, and the double descent phenomenon in the context of dependent data. Simulations validate our theoretical predictions, shedding light on how dependencies influence estimation error and the choice of regularization parameters.
翻译:暂无翻译