Recent sharpness-aware minimisation (SAM) is known to find flat minima which is beneficial for better generalisation with improved robustness. SAM essentially modifies the loss function by reporting the maximum loss value within the small neighborhood around the current iterate. However, it uses the Euclidean ball to define the neighborhood, which can be inaccurate since loss functions for neural networks are typically defined over probability distributions (e.g., class predictive probabilities), rendering the parameter space non Euclidean. In this paper we consider the information geometry of the model parameter space when defining the neighborhood, namely replacing SAM's Euclidean balls with ellipsoids induced by the Fisher information. Our approach, dubbed Fisher SAM, defines more accurate neighborhood structures that conform to the intrinsic metric of the underlying statistical manifold. For instance, SAM may probe the worst-case loss value at either a too nearby or inappropriately distant point due to the ignorance of the parameter space geometry, which is avoided by our Fisher SAM. Another recent Adaptive SAM approach stretches/shrinks the Euclidean ball in accordance with the scale of the parameter magnitudes. This might be dangerous, potentially destroying the neighborhood structure. We demonstrate improved performance of the proposed Fisher SAM on several benchmark datasets/tasks.
翻译:已知最近的锐利度最小化( SAM) 发现平面最小值( SAM ), 有利于以更稳健的方式更好地概括范围。 SAM 基本上通过报告当前迭代周围小街区的最大损失值来改变损失功能。 但是, 它使用 Euclidean 球来定义周边, 这可能不准确, 因为神经网络的损失函数通常被定义为概率分布( 例如, 等级预测概率), 使得参数空间不具有 EIcliidean 。 在本文中, 我们考虑模型参数空间空间空间在界定周边时的信息几何测量法, 即以渔业信息引发的环球取代 SAM 的 Eucliidean 球。 但是, 我们的方法, 调频的Fisheror SAM, 定义了更精确的周边结构, 符合基本统计矩阵的内在测量值。 例如, SAM 可能会在太近或太远的点上找到最坏的损失值值, 是因为我们Fisheram SAM 所避免的参数空间测度。 近期的SAM 范围/ srinks 方法可能摧毁了Euclifellimal ball 基准。