An influential line of recent work has focused on the generalization properties of unregularized gradient-based learning procedures applied to separable linear classification with exponentially-tailed loss functions. The ability of such methods to generalize well has been attributed to the their implicit bias towards large margin predictors, both asymptotically as well as in finite time. We give an additional unified explanation for this generalization and relate it to two simple properties of the optimization objective, that we refer to as realizability and self-boundedness. We introduce a general setting of unconstrained stochastic convex optimization with these properties, and analyze generalization of gradient methods through the lens of algorithmic stability. In this broader setting, we obtain sharp stability bounds for gradient descent and stochastic gradient descent which apply even for a very large number of gradient steps, and use them to derive general generalization bounds for these algorithms. Finally, as direct applications of the general bounds, we return to the setting of linear classification with separable data and establish several novel test loss and test accuracy bounds for gradient descent and stochastic gradient descent for a variety of loss functions with different tail decay rates. In some of these cases, our bounds significantly improve upon the existing generalization error bounds in the literature.
翻译:最近一项有影响的工作侧重于非常规的梯度学习程序的一般特性,这些程序适用于具有指数尾数损失功能的分线性分类,这种方法的普及能力可归因于它们暗含地偏向大差值预测器,既无症状,也有限时间。我们对这一概括性作了进一步的一致解释,并将其与优化目标的两个简单特性相联系,即我们称之为可真实性和自我约束性。我们对这些特性采用一个不加限制的随机孔心锥形优化的一般设置,并通过算法稳定性的透镜分析梯度方法的一般化。在这个较宽的环境下,我们获得了梯度下降和偏差梯度梯度下降的清晰稳定性界限,这些界限甚至适用于非常大量的梯度步骤,并利用这些界限为这些算法的概括性界限。最后,作为一般界限的直接应用,我们又恢复到用可分解的数据来确定线性分类,并确立一些新的测试损失和测试梯度下降和梯度梯度梯度梯度梯度梯度下降的精确界限。在这个较宽的环境下,我们获得一些梯度下降和偏梯度梯度梯度梯度下降的精确度下降的精确度下降值下降的精确度的精确度,从而大大改进了我们现有的各种尾部位损失率。