SGD的稳定性:强度分析和改进 (Stability of SGD: Tightness Analysis and Improved Bounds)

Stochastic Gradient Descent (SGD) based methods have been widely used for training large-scale machine learning models that also generalize well in practice. Several explanations have been offered for this generalization performance, a prominent one being algorithmic stability [18]. However, there are no known examples of smooth loss functions for which the analysis can be shown to be tight. Furthermore, apart from the properties of the loss function, data distribution has also been shown to be an important factor in generalization performance. This raises the question: is the stability analysis of [18] tight for smooth functions, and if not, for what kind of loss functions and data distributions can the stability analysis be improved? In this paper we first settle open questions regarding tightness of bounds in the data-independent setting: we show that for general datasets, the existing analysis for convex and strongly-convex loss functions is tight, but it can be improved for non-convex loss functions. Next, we give a novel and improved data-dependent bounds: we show stability upper bounds for a large class of convex regularized loss functions, with negligible regularization parameters, and improve existing data-dependent bounds in the non-convex setting. We hope that our results will initiate further efforts to better understand the data-dependent setting under non-convex loss functions, leading to an improved understanding of the generalization abilities of deep networks.

翻译：在培训大规模机器学习模型时,广泛采用了基于沙粒梯子(SGD)的方法来培训大规模机器学习模型,这些模型在实践上也非常普遍。对这种概括性表现提出了若干解释,其中突出的是算法稳定性[18]。然而,没有已知的顺利损失功能的例子,因此可以显示分析十分紧张。此外,除了损失功能的特性外,数据分布也证明是一般损失功能的一个重要因素。这提出了这样一个问题:对于顺利功能来说,[18]的稳定性分析是紧紧紧的,如果不是紧紧紧的,对于何种损失功能和数据分布可以改进?在本文中,我们首先解决关于数据依赖性设置的界限紧紧的未决问题:我们表明对于一般数据集,现有的对 convex和强凝固的丢失功能的分析是紧凑的,但对于非 convex损失功能,数据分布也可以改进。我们给出了一个新的和更好的数据依赖的界限:对于大量固定损失功能,我们展示了何种类型的固定损失函数的稳定性上限,我们首先可以解决关于数据依赖数据配置能力,我们无法忽略的常规化参数,我们将改进现有的数据定位努力,在确定不依赖性的努力之下,将使得数据变得更接近丧失。

相关内容

损失函数（机器学习）

关注 10

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

应用机器学习书稿，361页pdf

专知会员服务

59+阅读 · 2020年11月24日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日