We study the generalization properties of the popular stochastic optimization method known as stochastic gradient descent (SGD) for optimizing general non-convex loss functions. Our main contribution is providing upper bounds on the generalization error that depend on local statistics of the stochastic gradients evaluated along the path of iterates calculated by SGD. The key factors our bounds depend on are the variance of the gradients (with respect to the data distribution) and the local smoothness of the objective function along the SGD path, and the sensitivity of the loss function to perturbations to the final output. Our key technical tool is combining the information-theoretic generalization bounds previously used for analyzing randomized variants of SGD with a perturbation analysis of the iterates.
翻译:我们研究流行的随机优化方法(称为随机梯度梯度下降法)的一般特性,以优化一般非对流体损失功能。我们的主要贡献是提供一般误差的上限,该误差取决于SGD所计算的迭代路径上评估的随机梯度的本地统计。我们所依赖的关键因素是梯度的差异(数据分布方面)和SGD路径上目标功能的本地平稳性,以及损失函数对扰动最终输出的敏感度。我们的关键技术工具是将先前用于分析SGD随机变异的信息理论一般误差与迭代体的扰动分析结合起来。