Acquisition of data is a difficult task in many applications of machine learning, and it is only natural that one hopes and expects the population risk to decrease (better performance) monotonically with increasing data points. It turns out, somewhat surprisingly, that this is not the case even for the most standard algorithms that minimize the empirical risk. Non-monotonic behavior of the risk and instability in training have manifested and appeared in the popular deep learning paradigm under the description of double descent. These problems highlight the current lack of understanding of learning algorithms and generalization. It is, therefore, crucial to pursue this concern and provide a characterization of such behavior. In this paper, we derive the first consistent and risk-monotonic algorithms for a general statistical learning setting under weak assumptions, consequently resolving an open problem Viering et al. 2019 on how to avoid non-monotonic behavior of risk curves. We further show that risk monotonicity need not necessarily come at the price of worse excess risk rates. To achieve this, we derive new empirical Bernstein-like concentration inequalities of independent interest that hold for certain non-i.i.d. processes such as Martingale Difference Sequences.
翻译:在许多机器学习应用中,获取数据是一项艰巨的任务,人们希望并期望人口风险随着数据点的增加而单调地减少(更好的表现)是自然而然的。结果,有些令人惊讶的是,即使最标准的算法将经验风险降到最低程度,情况并非如此。培训风险和不稳定的非口头行为已经表现在并出现在流行的深层次学习模式中,根据双血曲线的描述。这些问题突出表明目前缺乏对学习算法和一般化的理解。因此,追求这一关切并提供有关这种行为的定性至关重要。在本文中,我们得出了在虚弱假设下进行一般统计学习的第一个一致和风险分子算法,从而解决了如何避免风险曲线的非口头行为这一公开问题。我们进一步表明,风险单一性不一定会以更严重的风险率的价格出现。为了达到这一点,我们得出了新的经验性伯恩斯坦式的集中性不平等,这对某些非.i.d.具有独立利益,例如Martaledequequese。