Complexity is a fundamental concept underlying statistical learning theory that aims to inform generalization performance. Parameter count, while successful in low-dimensional settings, is not well-justified for overparameterized settings when the number of parameters is more than the number of training samples. We revisit complexity measures based on Rissanen's principle of minimum description length (MDL) and define a novel MDL-based complexity (MDL-COMP) that remains valid for overparameterized models. MDL-COMP is defined via an optimality criterion over the encodings induced by a good Ridge estimator class. We provide an extensive theoretical characterization of MDL-COMP for linear models and kernel methods and show that it is not just a function of parameter count, but rather a function of the singular values of the design or the kernel matrix and the signal-to-noise ratio. For a linear model with $n$ observations, $d$ parameters, and i.i.d. Gaussian predictors, MDL-COMP scales linearly with $d$ when $d<n$, but the scaling is exponentially smaller -- $\log d$ for $d>n$. For kernel methods, we show that MDL-COMP informs minimax in-sample error, and can decrease as the dimensionality of the input increases. We also prove that MDL-COMP upper bounds the in-sample mean squared error (MSE). Via an array of simulations and real-data experiments, we show that a data-driven Prac-MDL-COMP informs hyper-parameter tuning for optimizing test MSE with ridge regression in limited data settings, sometimes improving upon cross-validation and (always) saving computational costs. Finally, our findings also suggest that the recently observed double decent phenomenons in overparameterized models might be a consequence of the choice of non-ideal estimators.
翻译:复杂度是统计学习理论的基本概念基础,旨在为概括性表现提供信息。 参数计虽然在低度设置中是成功的,但在参数数量超过培训样本数量时,对于超分化设置则不完全合理。 我们根据里萨南最低描述长度原则重新审视复杂度措施,并定义基于新颖MDL的复杂度(MDL-COMP),该复杂度对于超分化模型仍然有效。 MDL-COMP是通过一个最佳标准来定义的。 MDL-COMP 相对于一个好的海脊估计值类引导的编码。我们为线性模型和内核内核模型提供了广泛的MDL-COMP的理论描述,我们为线性模型提供了MDL-COM- COMP 对线性模型和内核内核内核内核模型的理论分析值的理论描述值值值值值值值值值值值值值值值值。 对于美元观察、美元参数和i.d. 高地预测器,MDL-COM- Sl- Sild 的数值比值比值比值比值比值比值比值比值比值比值比值, 当值值值值为美元,我们更值值值值值值值值为美元时,我们为美元时,我们更值,我们更值的数值, 也显示一个数字- 的数值, 的数值级的数值级值, 也显示的数值,, 以内变值- 。