免费的夏普知识培训 (Sharpness-Aware Training for Free)

Modern deep neural networks (DNNs) have achieved state-of-the-art performances but are typically over-parameterized. The over-parameterization may result in undesirably large generalization error in the absence of other customized training strategies. Recently, a line of research under the name of Sharpness-Aware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error. However, SAM-like methods incur a two-fold computational overhead of the given base optimizer (e.g. SGD) for approximating the sharpness measure. In this paper, we propose Sharpness-Aware Training for Free, or SAF, which mitigates the sharp landscape at almost zero additional computational cost over the base optimizer. Intuitively, SAF achieves this by avoiding sudden drops in the loss in the sharp local minima throughout the trajectory of the updates of the weights. Specifically, we suggest a novel trajectory loss, based on the KL-divergence between the outputs of DNNs with the current weights and past weights, as a replacement of the SAM's sharpness measure. This loss captures the rate of change of the training loss along the model's update trajectory. By minimizing it, SAF ensures the convergence to a flat minimum with improved generalization capabilities. Extensive empirical results show that SAF minimizes the sharpness in the same way that SAM does, yielding better results on the ImageNet dataset with essentially the same computational cost as the base optimizer.

翻译：现代深度神经网络(DNNS)已经达到最新水平的性能,但通常被过度量化。过度参数化可能导致在缺乏其他定制培训战略的情况下出现不尽人意的大规模一般化错误。最近,以Sharpness-Aware最小化(SAM)为名进行的一行研究表明,将反映损失地貌几何的锐化测量方法可以大大减少一般化误差。然而,类似SAM的方法在接近精确度测量的基底优化器(例如SGD)方面产生双重计算间接间接间接偏差。在本文件中,我们提议为自由或苏丹武装部队开展夏利软件培训,以比基底优化器几乎零的额外计算成本来缓解剧烈的景色。从心目上看,通过避免在质量更新的轨迹轨迹上急剧减少当地迷你的损失,可以大大降低。具体地说,我们建议根据KL-GND(即SGD)的精确度测算结果的精确度测算结果,以当前深度测算的精确度测算速度更新SNNF的精确度测测测测算结果,以比重的测测测测测测测测测测测测测测测测的精确度的精确度,从而测测测测测测测测测测测测测测测测测测测测测测的精确度的精确度值的损率,从而测测测测测测测测差差差差值的精确度差率的精确度的精确度率的精确度率的精确度率率率率率率率率率率率率率率率率率率率率率率率率率率率率率率的比。