Overparametrized Deep Neural Networks (DNNs) often achieve astounding performances, but may potentially result in severe generalization error. Recently, the relation between the sharpness of the loss landscape and the generalization error has been established by Foret et al. (2020), in which the Sharpness Aware Minimizer (SAM) was proposed to mitigate the degradation of the generalization. Unfortunately, SAM s computational cost is roughly double that of base optimizers, such as Stochastic Gradient Descent (SGD). This paper thus proposes Efficient Sharpness Aware Minimizer (ESAM), which boosts SAM s efficiency at no cost to its generalization performance. ESAM includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection. In the former, the sharpness measure is approximated by perturbing a stochastically chosen set of weights in each iteration; in the latter, the SAM loss is optimized using only a judiciously selected subset of data that is sensitive to the sharpness. We provide theoretical explanations as to why these strategies perform well. We also show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis base optimizers, while test accuracies are preserved or even improved.
翻译:高超的深神经网络(DNNS)往往能取得惊人的性能,但可能会导致严重的总体错误。最近,Foret等人(202020年)确定了损失景观的锐度与一般错误之间的关系,其中提议了锐度最小化最小化器(SAM),以缓解一般化的退化。不幸的是,SAM的计算成本大约是基础优化器的两倍,如斯托切斯梯级梯底(SGD)。因此,本文件建议了高效敏度最小化意识(ESAM),这在不增加成本的情况下提高了SAM的普及性工作效率。ESAM包括两个新颖而有效的培训战略-Stochatical Verabilation 和敏度敏度敏度敏度数据选择。在前者中,锐度度度测量器的计算成本大约是每升度所选择的一组重力的振动力;在后者中,SAM损失的优化仅使用一个明智选择的、对精确度敏感的一组数据。我们提供了理论解释性解释,而SARM的模型则通过高估了这些测试基础。