Generalization analyses of deep learning typically assume that the training converges to a fixed point. But, recent results indicate that in practice, the weights of deep neural networks optimized with stochastic gradient descent often oscillate indefinitely. To reduce this discrepancy between theory and practice, this paper focuses on the generalization of neural networks whose training dynamics do not necessarily converge to fixed points. Our main contribution is to propose a notion of statistical algorithmic stability (SAS) that extends classical algorithmic stability to non-convergent algorithms and to study its connection to generalization. This ergodic-theoretic approach leads to new insights when compared to the traditional optimization and learning theory perspectives. We prove that the stability of the time-asymptotic behavior of a learning algorithm relates to its generalization and empirically demonstrate how loss dynamics can provide clues to generalization performance. Our findings provide evidence that networks that "train stably generalize better" even when the training continues indefinitely and the weights do not converge.
翻译:深层学习的概括分析通常假定培训会达到一个固定点。 但是,最近的结果显示,在实践上,以随机梯度梯度下降优化的深神经网络的重量往往会无限期地悬浮。为了缩小理论与实践之间的这种差异,本文件侧重于对神经网络的概括化,这些神经网络的培训动态不一定会与固定点趋同。我们的主要贡献是提出统计算法稳定性的概念(SAS),将古典算法稳定性扩大到非一致算法,并研究其与一般化的联系。在与传统的优化和学习理论观点相比较时,这种ergodic-神学方法导致新的洞察力。我们证明,学习算法的时间-无损行为的稳定与它的一般化和实验性表明损失动态如何能为概括性表现提供线索。我们的调查结果证明,即使培训继续无限期,重量也不趋同,但“刀拔刀拔刀扎刀”的网络也“会更好”。