We propose Dataset Reinforcement, a strategy to improve a dataset once such that the accuracy of any model architecture trained on the reinforced dataset is improved at no additional training cost for users. We propose a Dataset Reinforcement strategy based on data augmentation and knowledge distillation. Our generic strategy is designed based on extensive analysis across CNN- and transformer-based models and performing large-scale study of distillation with state-of-the-art models with various data augmentations. We create a reinforced version of the ImageNet training dataset, called ImageNet+, as well as reinforced datasets CIFAR-100+, Flowers-102+, and Food-101+. Models trained with ImageNet+ are more accurate, robust, and calibrated, and transfer well to downstream tasks (e.g., segmentation and detection). As an example, the accuracy of ResNet-50 improves by 1.7% on the ImageNet validation set, 3.5% on ImageNetV2, and 10.0% on ImageNet-R. Expected Calibration Error (ECE) on the ImageNet validation set is also reduced by 9.9%. Using this backbone with Mask-RCNN for object detection on MS-COCO, the mean average precision improves by 0.8%. We reach similar gains for MobileNets, ViTs, and Swin-Transformers. For MobileNetV3 and Swin-Tiny we observe significant improvements on ImageNet-R/A/C of up to 10% improved robustness. Models pretrained on ImageNet+ and fine-tuned on CIFAR-100+, Flowers-102+, and Food-101+, reach up to 3.4% improved accuracy.
翻译:我们提出数据集强化,这是改进数据集的战略,一旦经过强化数据集培训的任何模型架构的准确性得到提高,而不增加用户的培训费用。我们提出基于数据增强和知识蒸馏的数据集强化战略。我们的一般战略是在对CNN和变压器模型进行广泛分析的基础上设计的,并用各种数据增强的先进模型进行大规模蒸馏研究。我们创建了图像网培训数据集的强化版本,称为图像网+,以及强化的数据集CIFAR-100+、Flowers-102+和Food-Net-101+。用图像网+增强和知识蒸馏技术培训的模型更加精确、稳健、校准和向下游任务(例如分解和检测)转移。举例来说,ResNet-50的精度在图像网验证中提高了1.7%,在图像网/网络增强的3.5%,在图像网增强的图像网更新/R改进的精确度方面,在图像网校正校准数据集上改进的校正错误也降低了9.9%,在SMAS-NUR-S的S-S-S-SLIS-S-S-SARSBSSSBARS平均改进达到9.9%。我们在SMA-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S</s>