通过分布式强力实现风险-多孔性 (Risk-Monotonicity via Distributional Robustness)

Acquisition of data is a difficult task in most applications of Machine Learning (ML), and it is only natural that one hopes and expects lower populating risk (better performance) with increasing data points. It turns out, somewhat surprisingly, that this is not the case even for the most standard algorithms such as the Empirical Risk Minimizer (ERM). Non-monotonic behaviour of the risk and instability in training have manifested and appeared in the popular deep learning paradigm under the description of double descent. These problems not only highlight our lack of understanding of learning algorithms and generalization but rather render our efforts at data acquisition in vain. It is, therefore, crucial to pursue this concern and provide a characterization of such behaviour. In this paper, we derive the first consistent and risk-monotonic algorithms for a general statistical learning setting under weak assumptions, consequently resolving an open problem (Viering et al. 2019) on how to avoid non-monotonic behaviour of risk curves. Our algorithms make use of Distributionally Robust Optimization (DRO) -- a technique that has shown promise in other complications of deep learning such as adversarial training. Our work makes a significant contribution to the topic of risk-monotonicity, which may be key in resolving empirical phenomena such as double descent.

翻译：在机器学习(ML)的多数应用中,获取数据是一项困难的任务,人们希望并期望随着数据点的增加而降低传播风险(更好的性能),这是自然而然的。事实证明,有些令人惊讶的是,即使在经验风险最小化(ERM)等最标准的算法中,情况并非如此。培训中的风险和不稳定的非口头行为已经表现在并出现在流行的深层次学习模式中,其描述是双向的。这些问题不仅突出表明我们缺乏对学习算法和一般化的理解,而且使我们在获取数据方面的努力徒劳无益。因此,继续关注这一问题并提供这种行为的特征至关重要。在本文中,我们为在薄弱的假设下的总体统计学习提出了第一个一致和风险分子算法,从而解决了一个公开的问题(Viring等人,2019年),即如何避免风险曲线的非口头行为。我们的算法不仅强调了我们缺乏对学习算法和一般化的理解,而且使我们在获取数据方面的努力徒劳无益。因此,我们必须坚持这一关切,提供这种行为的特征特征。在本文件中,我们的工作可以极大地促进作为关键性理论论论论论论论论的论的论的论题。