Random matrix theory has become a widely useful tool in high-dimensional statistics and theoretical machine learning. However, random matrix theory is largely focused on the proportional asymptotics in which the number of columns grows proportionally to the number of rows of the data matrix. This is not always the most natural setting in statistics where columns correspond to covariates and rows to samples. With the objective to move beyond the proportional asymptotics, we revisit ridge regression ($\ell_2$-penalized least squares) on i.i.d. data $(x_i, y_i)$, $i\le n$, where $x_i$ is a feature vector and $y_i = \beta^\top x_i +\epsilon_i \in\mathbb{R}$ is a response. We allow the feature vector to be high-dimensional, or even infinite-dimensional, in which case it belongs to a separable Hilbert space, and assume either $z_i := \Sigma^{-1/2}x_i$ to have i.i.d. entries, or to satisfy a certain convex concentration property. Within this setting, we establish non-asymptotic bounds that approximate the bias and variance of ridge regression in terms of the bias and variance of an `equivalent' sequence model (a regression model with diagonal design matrix). The approximation is up to multiplicative factors bounded by $(1\pm \Delta)$ for some explicitly small $\Delta$. Previously, such an approximation result was known only in the proportional regime and only up to additive errors: in particular, it did not allow to characterize the behavior of the excess risk when this converges to $0$. Our general theory recovers earlier results in the proportional regime (with better error rates). As a new application, we obtain a completely explicit and sharp characterization of ridge regression for Hilbert covariates with regularly varying spectrum. Finally, we analyze the overparametrized near-interpolation setting and obtain sharp `benign overfitting' guarantees.
翻译:随机矩阵理论已成为高维统计和理论机器学习中一个广泛有用的工具。 然而, 随机矩阵理论主要侧重于比例化的平方位, 其中列数与数据矩阵行数成比例增长。 这并不总是统计中最自然的设置, 其列与千变形和行与样本相对称。 为了超越比例化, 我们在 i. i. d. 上重新研究峰值回归( ell_ 2$- penalizion 最少平方) 。 数据 $( x_ i, y_ i) $, $( $) 美元, 其中, 美元是一个特性向量递增的向量, 美元向上递增的向量, 直径向量的向量, 直方位的向量, 直方位向上, 直方位的向上, 直方位向上, 直方位的向上, 直方位向上, 直方位向上, 直方, 直方的正方位向上方位向, 直方位向, 直方位向, 直方位的向, 直方位, 直方位向, 直方位的向, 直方位, 直方位的向, 直方位, 直方位, 直方位, 直方位, 直方位, 直方位, 直方位, 直方位, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 立方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方, 直方