Flatness of the loss curve is conjectured to be connected to the generalization ability of machine learning models, in particular neural networks. While it has been empirically observed that flatness measures consistently correlate strongly with generalization, it is still an open theoretical problem why and under which circumstances flatness is connected to generalization, in particular in light of reparameterizations that change certain flatness measures but leave generalization unchanged. We investigate the connection between flatness and generalization by relating it to the interpolation from representative data, deriving notions of representativeness, and feature robustness. The notions allow us to rigorously connect flatness and generalization and to identify conditions under which the connection holds. Moreover, they give rise to a novel, but natural relative flatness measure that correlates strongly with generalization, simplifies to ridge regression for ordinary least squares, and solves the reparameterization issue.
翻译:据推测,损失曲线的平整度与机器学习模型,特别是神经网络的一般化能力有关,虽然从经验上看,平整度测量始终与一般化紧密相关,但仍是一个开放的理论问题,因为在此情况下,平整度与一般化相关,特别是鉴于对整齐度的重新计法改变了某些平整度测量法,但一般化保持不变。我们通过将平整度与代表性数据的内推法、代表性概念和特征坚固性联系起来,来调查平整和一般化之间的联系。这些概念使我们能够严格地将平整度和一般化联系起来,并查明连接所处的条件。此外,它们产生了一种与一般化密切相关的新的、但自然相对的平整度度度测量法,使普通平方的平方的平坦度缩缩缩,并解决重新计法问题。