在可分离数据上梯度下层的紧度风险孔径</s> (Tight Risk Bounds for Gradient Descent on Separable Data)

We study the generalization properties of unregularized gradient methods applied to separable linear classification -- a setting that has received considerable attention since the pioneering work of Soudry et al. (2018). We establish tight upper and lower (population) risk bounds for gradient descent in this setting, for any smooth loss function, expressed in terms of its tail decay rate. Our bounds take the form $\Theta(r_{\ell,T}^2 / \gamma^2 T + r_{\ell,T}^2 / \gamma^2 n)$, where $T$ is the number of gradient steps, $n$ is size of the training set, $\gamma$ is the data margin, and $r_{\ell,T}$ is a complexity term that depends on the (tail decay rate) of the loss function (and on $T$). Our upper bound matches the best known upper bounds due to Shamir (2021); Schliserman and Koren (2022), while extending their applicability to virtually any smooth loss function and relaxing technical assumptions they impose. Our risk lower bounds are the first in this context and establish the tightness of our upper bounds for any given tail decay rate and in all parameter regimes. The proof technique used to show these results is also markedly simpler compared to previous work, and is straightforward to extend to other gradient methods; we illustrate this by providing analogous results for Stochastic Gradient Descent.

翻译：我们研究了用于分解线性分类的不正规梯度方法的概括性特性 -- -- 自苏德里等人的开创性工作(2018年)以来,这一设置一直受到相当的重视。我们在这一设置中为梯度下降设置了紧紧的上下(人口)风险界限,对于任何顺滑损失功能,其表现为尾尾部衰减率。我们的界限是:$theta(r<unk> ell,T<unk> 2/\gamma2T+r<unk> ell,T<unk> 2/\gamma2n) T+r<unk> ell,T<unk> 2/gamma2n)美元,其中,美元是梯度步骤的数目,美元是培训的大小,美元是培训的大小,美元(人口)是在这个设置中为梯度,美元(人口)是一个复杂的条件,取决于损失功能(尾部衰减率)的(和美元)。我们的上边框与Shamir(2021年);Schlibererman和Koren(2022年)最知名的上限,同时将其适用到任何平稳损失功能,放松的技术假设。我们的风险下限是这个背景中的第一个背景的底线是这个背景的缩缩的缩,我们用来展示的缩和直径直线,用来说明。</s>

相关内容

损失函数（机器学习）

关注 10

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日