Stochastic Gradient Descent (SGD) based methods have been widely used for training large-scale machine learning models that also generalize well in practice. Several explanations have been offered for this generalization performance, a prominent one being algorithmic stability [18]. However, there are no known examples of smooth loss functions for which the analysis can be shown to be tight. Furthermore, apart from the properties of the loss function, data distribution has also been shown to be an important factor in generalization performance. This raises the question: is the stability analysis of [18] tight for smooth functions, and if not, for what kind of loss functions and data distributions can the stability analysis be improved? In this paper we first settle open questions regarding tightness of bounds in the data-independent setting: we show that for general datasets, the existing analysis for convex and strongly-convex loss functions is tight, but it can be improved for non-convex loss functions. Next, we give a novel and improved data-dependent bounds: we show stability upper bounds for a large class of convex regularized loss functions, with negligible regularization parameters, and improve existing data-dependent bounds in the non-convex setting. We hope that our results will initiate further efforts to better understand the data-dependent setting under non-convex loss functions, leading to an improved understanding of the generalization abilities of deep networks.
翻译:在培训大规模机器学习模型时,广泛采用了基于沙粒梯子(SGD)的方法来培训大规模机器学习模型,这些模型在实践上也非常普遍。对这种概括性表现提出了若干解释,其中突出的是算法稳定性[18]。然而,没有已知的顺利损失功能的例子,因此可以显示分析十分紧张。此外,除了损失功能的特性外,数据分布也证明是一般损失功能的一个重要因素。这提出了这样一个问题:对于顺利功能来说,[18]的稳定性分析是紧紧紧的,如果不是紧紧紧的,对于何种损失功能和数据分布可以改进?在本文中,我们首先解决关于数据依赖性设置的界限紧紧的未决问题:我们表明对于一般数据集,现有的对 convex和强凝固的丢失功能的分析是紧凑的,但对于非 convex损失功能,数据分布也可以改进。我们给出了一个新的和更好的数据依赖的界限:对于大量固定损失功能,我们展示了何种类型的固定损失函数的稳定性上限,我们首先可以解决关于数据依赖数据配置能力,我们无法忽略的常规化参数,我们将改进现有的数据定位努力,在确定不依赖性的努力之下,将使得数据变得更接近丧失。