We establish risk bounds for Regularized Empirical Risk Minimizers (RERM) when the loss is Lipschitz and convex and the regularization function is a norm. In a first part, we obtain these results in the i.i.d. setup under subgaussian assumptions on the design. In a second part, a more general framework where the design might have heavier tails and data may be corrupted by outliers both in the design and the response variables is considered. In this situation, RERM performs poorly in general. We analyse an alternative procedure based on median-of-means principles and called minmax MOM. We show optimal subgaussian deviation rates for these estimators in the relaxed setting. The main results are meta-theorems allowing a wide-range of applications to various problems in learning theory. To show a non-exhaustive sample of these potential applications, it is applied to classification problems with logistic loss functions regularized by LASSO and SLOPE, to regression problems with Huber loss regularized by Group LASSO and Total Variation. Another advantage of the minmax MOM formulation is that it suggests a systematic way to slightly modify descent based algorithms used in high-dimensional statistics to make them robust to outliers. We illustrate this principle in a Simulations section where a minmax MOM version of classical proximal descent algorithms are turned into robust to outliers algorithms.
翻译:当损失为Lipschitz和 convex以及正规化功能为规范时,我们为正规化的经验风险最小化者(RERM)建立了风险界限。在第一部分,我们在设计设计假设下设置了i.i.i.d. 设置了这些结果。在第二部分,在第二部分,设计设计可能有较重的尾部和数据可能被外端在设计和反应变量中考虑的外端破坏的更一般性框架。在这种情况下,RERM一般表现不佳。我们分析了基于中位荷位原则的替代程序,并称之为minmax MOM。我们展示了这些在宽松环境下的天平偏差率。主要结果为元论,允许广泛应用学习理论中的各种问题。要展示出这些潜在应用的不完全样本,就应用到由LASSO和SLOPE规范的后勤损失功能的分类问题,我们分析了基于中位值原则的休伯损失和Tond Variation的回归问题。我们展示了这些缩略图的另一种优势,就是将精度缩缩缩图用于Simmax的系统版MOM。