Acquisition of data is a difficult task in many applications of machine learning, and it is only natural that one hopes and expects the population risk to decrease (better performance) monotonically with increasing data points. It turns out, somewhat surprisingly, that this is not the case even for the most standard algorithms that minimize the empirical risk. Non-monotonic behavior of the risk and instability in training have manifested and appeared in the popular deep learning paradigm under the description of double descent. These problems highlight the current lack of understanding of learning algorithms and generalization. It is, therefore, crucial to pursue this concern and provide a characterization of such behavior. In this paper, we derive the first consistent and risk-monotonic (in high probability) algorithms for a general statistical learning setting under weak assumptions, consequently answering some questions posed by Viering et al. 2019 on how to avoid non-monotonic behavior of risk curves. We further show that risk monotonicity need not necessarily come at the price of worse excess risk rates. To achieve this, we derive new empirical Bernstein-like concentration inequalities of independent interest that hold for certain non-i.i.d.~processes such as Martingale Difference Sequences.
翻译:在机器学习的许多应用中,获取数据是一项艰巨的任务,人们希望并期望人口风险随着数据点的增加而单调地减少(更好的表现)是自然而然的。结果,有些令人惊讶的是,即使最标准的算法将经验性风险降到最低程度,情况并非如此。培训风险和不稳定的非口头行为已经表现在并出现在流行的深层次学习模式中,根据双血曲线的描述。这些问题突出表明目前对学习算法和一般化缺乏了解。因此,继续关注这一问题并提供这种行为的特征至关重要。在本文中,我们得出了在虚弱假设下进行一般性统计学习的第一个一致和风险分子(高概率)算法,从而回答了Viring等人(2019年)就如何避免风险曲线的非口头行为提出的一些问题。我们进一步表明,风险单一性不一定以更低的风险率的价格出现。为了达到这一点,我们取得了新的经验性伯恩斯坦相似的集中性不平等性,这种不平等性使某些非结果产生。Drequestal.d.macresscaregress,例如Martial-ceal。