Due to diverse architectures in deep neural networks (DNNs) with severe overparameterization, regularization techniques are critical for finding optimal solutions in the huge hypothesis space. In this paper, we propose an effective regularization technique, called Neighborhood Region Smoothing (NRS). NRS leverages the finding that models would benefit from converging to flat minima, and tries to regularize the neighborhood region in weight space to yield approximate outputs. Specifically, gap between outputs of models in the neighborhood region is gauged by a defined metric based on Kullback-Leibler divergence. This metric provides similar insights with the minimum description length principle on interpreting flat minima. By minimizing both this divergence and empirical loss, NRS could explicitly drive the optimizer towards converging to flat minima. We confirm the effectiveness of NRS by performing image classification tasks across a wide range of model architectures on commonly-used datasets such as CIFAR and ImageNet, where generalization ability could be universally improved. Also, we empirically show that the minima found by NRS would have relatively smaller Hessian eigenvalues compared to the conventional method, which is considered as the evidence of flat minima.
翻译:由于深度神经网络(DNN)中存在各种结构,且严重超度分化,因此,正规化技术对于在巨大的假设空间中找到最佳解决方案至关重要。在本文中,我们提出一种有效的正规化技术,称为邻里平滑(NRS)。 NRS利用了以下发现:模型将获益于趋同到平坦的迷你,并试图使周边区域在重量空间中实现正规化,以产生大致产出。具体地说,以基于库尔背利利尔差异的界定尺度来衡量周边区域模型产出之间的差距。这一指标提供了类似的洞察力,提出了解释平坦迷你模型的最低描述长度原则。通过尽量减少这种差异和经验损失,NRS可以明确地将优化推进到趋同到平坦的迷你模型。我们确认NRS的有效性,通过在通用数据集(如CIFAR和图像网络)上执行广泛的图像分类任务,可以普遍提高一般化能力。此外,我们从经验上表明,NRS发现的小模型将比传统方法(即被视为平板证据)的Hesianegenvalue值要小得多。