Generalization is a central problem in Machine Learning. Indeed most prediction methods require careful calibration of hyperparameters usually carried out on a hold-out \textit{validation} dataset to achieve generalization. The main goal of this paper is to introduce a novel approach to achieve generalization without any data splitting, which is based on a new risk measure which directly quantifies a model's tendency to overfit. To fully understand the intuition and advantages of this new approach, we illustrate it in the simple linear regression model ($Y=X\beta+\xi$) where we develop a new criterion. We highlight how this criterion is a good proxy for the true generalization risk. Next, we derive different procedures which tackle several structures simultaneously (correlation, sparsity,...). Noticeably, these procedures \textbf{concomitantly} train the model and calibrate the hyperparameters. In addition, these procedures can be implemented via classical gradient descent methods when the criterion is differentiable w.r.t. the hyperparameters. Our numerical experiments reveal that our procedures are computationally feasible and compare favorably to the popular approach (Ridge, LASSO and Elastic-Net combined with grid-search cross-validation) in term of generalization. They also outperform the baseline on two additional tasks: estimation and support recovery of $\beta$. Moreover, our procedures do not require any expertise for the calibration of the initial parameters which remain the same for all the datasets we experimented on.
翻译:常规化是机器学习的一个中心问题。 事实上, 多数预测方法都需要仔细校准通常在拖放\ textit{ 校验} 数据集中进行的超参数, 才能实现总体化。 本文的主要目的是引入一种新的方法, 在不分割任何数据的情况下实现总体化。 新的风险度量可以直接量化模型的过度适应趋势。 为了充分理解这一新方法的直觉和优势, 我们可以用简单线性回归模型( Y=X\beta ⁇ xxix $) 来演示。 我们用新标准来说明。 我们强调这个标准如何成为真正总体化风险的好替代物。 接下来, 我们提出不同的程序, 既能同时处理多个结构( orrelation, 宽度化.... ), 也可以引入新的风险度量度值。 此外, 可以通过经典的梯度下降法方法执行这些程序, 标准是不同的 w.r. t. t. 。 我们的数值实验显示, 我们的程序在计算中是可行的, 并且可以比较一般的回收方法 。