Sharpness-aware minimization (SAM) aims to improve the generalisation of gradient-based learning by seeking out flat minima. In this work, we establish connections between SAM and Mean-Field Variational Inference (MFVI) of neural network parameters. We show that both these methods have interpretations as optimizing notions of flatness, and when using the reparametrisation trick, they both boil down to calculating the gradient at a perturbed version of the current mean parameter. This thinking motivates our study of algorithms that combine or interpolate between SAM and MFVI. We evaluate the proposed variational algorithms on several benchmark datasets, and compare their performance to variants of SAM. Taking a broader perspective, our work suggests that SAM-like updates can be used as a drop-in replacement for the reparametrisation trick.
翻译:锐化意识最小化( SAM) 旨在通过寻找平坦迷你,改善基于梯度的学习的普及性。 在这项工作中, 我们建立 SAM 和神经网络参数的中场变异推力( MFVI) 之间的联系。 我们显示, 这两种方法都具有优化平化概念的诠释, 而当使用再平衡技巧时, 它们都会以当前平均参数的扰动版本来计算梯度。 这种思维激励着我们对SAM 和 MFVI 之间组合或相互交错的算法的研究。 我们评估了几个基准数据集的拟议变异算法, 并将其与 SAM 变量的性能进行比较。 从更广泛的角度看, 我们的工作表明, 类似SAM 的更新可以用来作为再平衡策略的缩放替代 。