Data-augmentation is key to the training of neural networks for image classification. This paper first shows that existing augmentations induce a significant discrepancy between the typical size of the objects seen by the classifier at train and test time. We experimentally validate that, for a target test resolution, using a lower train resolution offers better classification at test time. We then propose a simple yet effective and efficient strategy to optimize the classifier performance when the train and test resolutions differ. It involves only a computationally cheap fine-tuning of the network at the test resolution. This enables training strong classifiers using small training images. For instance, we obtain 77.1% top-1 accuracy on ImageNet with a ResNet-50 trained on 128x128 images, and 79.8% with one trained on 224x224 image. In addition, if we use extra training data we get 82.5% with the ResNet-50 train with 224x224 images. Conversely, when training a ResNeXt-101 32x48d pre-trained in weakly-supervised fashion on 940 million public images at resolution 224x224 and further optimizing for test resolution 320x320, we obtain a test top-1 accuracy of 86.4% (top-5: 98.0%) (single-crop). To the best of our knowledge this is the highest ImageNet single-crop, top-1 and top-5 accuracy to date.
翻译:数据增强是培养神经网络进行图像分类的关键。 此文件首先显示, 现有的增强功能会在列车和测试时间的分类器所看到物体的典型大小之间产生巨大的差异。 我们实验验证, 对于一个目标测试分辨率, 使用较低的列车分辨率可在测试时间进行更好的分类 。 我们然后提出一项简单而有效且高效的战略, 以便在列车和测试分辨率不同时优化分类器的性能 。 它仅涉及测试分辨率时对网络进行廉价的计算微调 。 这样可以使用小培训图像对强大的分类器进行培训 。 例如, 我们在图像网上获得77.1%的顶级-1级精确度, 以128x128x128图像培训的 ResNet- 50, 79.8% 以 224x224 图像培训的顶级大小 。 此外, 如果我们使用额外的培训数据, 我们用 ResNet- 50 列车以 224x224 和 4x48 预先以弱化方式对网络进行精度培训, 能够用小型培训的940万张公共图像进行精度培训 。 例如, 24x224 和进一步优化测试第 320 和 5 5 图像的顶部 的顶部 的顶部测试, 我们获得了最佳的顶部的顶部的测试为98x 。