This paper follows up on a recent work of (Neu, 2021) and presents new and tighter information-theoretic upper bounds for the generalization error of machine learning models, such as neural networks, trained with SGD. We apply these bounds to analyzing the generalization behaviour of linear and two-layer ReLU networks. Experimental study based on these bounds provide some insights on the SGD training of neural networks. They also point to a new and simple regularization scheme which we show performs comparably to the current state of the art.
翻译:本文件跟踪了(Neu, 2021年)最近的工作,并介绍了关于机械学习模型(如通过SGD培训的神经网络等神经网络)的概括错误的新的和更严格的信息理论上限。我们运用这些界限来分析线性和双层ReLU网络的概括行为。基于这些界限的实验研究对SGD神经网络培训提供了一些深入了解。它们也指出了一个新的和简单的正规化计划,我们显示其表现与目前艺术水平相当。