It has been intensively investigated that the local shape, especially flatness, of the loss landscape near a minimum plays an important role for generalization of deep models. We developed a training algorithm called PoF: Post-Training of Feature Extractor that updates the feature extractor part of an already-trained deep model to search a flatter minimum. The characteristics are two-fold: 1) Feature extractor is trained under parameter perturbations in the higher-layer parameter space, based on observations that suggest flattening higher-layer parameter space, and 2) the perturbation range is determined in a data-driven manner aiming to reduce a part of test loss caused by the positive loss curvature. We provide a theoretical analysis that shows the proposed algorithm implicitly reduces the target Hessian components as well as the loss. Experimental results show that PoF improved model performance against baseline methods on both CIFAR-10 and CIFAR-100 datasets for only 10-epoch post-training, and on SVHN dataset for 50-epoch post-training. Source code is available at: \url{https://github.com/DensoITLab/PoF-v1
翻译:深入调查发现,损失场景的局部形状,特别是平坦度,在深度模型的普及方面起着重要作用。我们开发了一种名为PoF: 功能提取器后培训的培训算法,该算法更新了已经受过训练的深层模型的特征提取器部分,以寻找最受欢迎的最小值。其特征有两个方面:(1) 地貌提取器在高层参数空间的参数扰动下进行了培训,其依据的观察显示是平坦高层参数空间,(2) 以数据驱动的方式确定了扰动范围,目的是减少正态损失曲线造成的部分测试损失。我们提供了理论分析,表明拟议的算法隐含地减少了Hesian部件的目标以及损失。实验结果表明,PoF改进了模型在CIFAR-10和CIFAR-100的基线方法方面的性能,仅用于10epoch后训练,以及用于50-epoch后训练的SVHN数据集。源代码见:\urlhttps://github.com/DensoILab/POF。