Over the past decade, characterizing the exact asymptotic risk of regularized estimators in high-dimensional regression has emerged as a popular line of work. This literature considers the proportional asymptotics framework, where the number of features and samples both diverge, at a rate proportional to each other. Substantial work in this area relies on Gaussianity assumptions on the observed covariates. Further, these studies often assume the design entries to be independent and identically distributed. Parallel research investigates the universality of these findings, revealing that results based on the i.i.d.~Gaussian assumption extend to a broad class of designs, such as i.i.d.~sub-Gaussians. However, universality results examining dependent covariates so far focused on correlation-based dependence or a highly structured form of dependence, as permitted by right rotationally invariant designs. In this paper, we break this barrier and study a dependence structure that in general falls outside the purview of these established classes. We seek to pin down the extent to which results based on i.i.d.~Gaussian assumptions persist. We identify a class of designs characterized by a block dependence structure that ensures the universality of i.i.d.~Gaussian-based results. We establish that the optimal values of the regularized empirical risk and the risk associated with convex regularized estimators, such as the Lasso and ridge, converge to the same limit under block dependent designs as they do for i.i.d.~Gaussian entry designs. Our dependence structure differs significantly from correlation-based dependence, and enables, for the first time, asymptotically exact risk characterization in prevalent nonparametric regression problems in high dimensions. Finally, we illustrate through experiments that this universality becomes evident quite early, even for relatively moderate sample sizes.
翻译:暂无翻译