The primary goal of training in early convolutional neural networks (CNN) is the higher generalization performance of the model. However, as the expected calibration error (ECE), which quantifies the explanatory power of model inference, was recently introduced, research on training models that can be explained is in progress. We hypothesized that a gap in supervision criteria during training and inference leads to overconfidence, and investigated that performing label distribution learning (LDL) would enhance the model calibration in CNN training. To verify this assumption, we used a simple LDL setting with recent data augmentation techniques. Based on a series of experiments, the following results are obtained: 1) State-of-the-art KD methods significantly impede model calibration. 2) Training using LDL with recent data augmentation can have excellent effects on model calibration and even in generalization performance. 3) Online LDL brings additional improvements in model calibration and accuracy with long training, especially in large-size models. Using the proposed approach, we simultaneously achieved a lower ECE and higher generalization performance for the image classification datasets CIFAR10, 100, STL10, and ImageNet. We performed several visualizations and analyses and witnessed several interesting behaviors in CNN training with the LDL.
翻译:早期进化神经网络培训的首要目标是提高模型的通用性能。然而,由于最近引入了预期校准错误(ECE)来量化模型推断的解释力,对可以解释的培训模型的研究正在进行中。我们假设,在培训和推理过程中监督标准的差距会导致过度自信,并调查,进行标签分配学习(LDL)将加强CNN培训的模型校准。为了核实这一假设,我们利用最新的数据增强技术来使用简单的LDL设置。根据一系列实验,取得了以下结果:(1) 最新KD方法大大妨碍了模型校准。(2) 使用LDL进行的培训,而最近的数据增强可对模型校准乃至一般化产生极好的效果。(3) 在线LDL通过长期培训,特别是在大型模型模型中,使模型校准和准确性得到进一步的改进。我们同时利用拟议的方法,在图像分类数据集CIFAR10、100、STL10和图像LL网络中,我们用一些令人感兴趣的分析和图像LD培训进行了若干次视觉分析。