We report competitive results on object detection and instance segmentation on the COCO dataset using standard models trained from random initialization. The results are no worse than their ImageNet pre-training counterparts even when using the hyper-parameters of the baseline system (Mask R-CNN) that were optimized for fine-tuning pre-trained models, with the sole exception of increasing the number of training iterations so the randomly initialized models may converge. Training from random initialization is surprisingly robust; our results hold even when: (i) using only 10% of the training data, (ii) for deeper and wider models, and (iii) for multiple tasks and metrics. Experiments show that ImageNet pre-training speeds up convergence early in training, but does not necessarily provide regularization or improve final target task accuracy. To push the envelope we demonstrate 50.9 AP on COCO object detection without using any external data---a result on par with the top COCO 2017 competition results that used ImageNet pre-training. These observations challenge the conventional wisdom of ImageNet pre-training for dependent tasks and we expect these discoveries will encourage people to rethink the current de facto paradigm of `pre-training and fine-tuning' in computer vision.
翻译:我们使用随机初始化培训的标准模型报告COCO数据集物体探测和试例分类的竞争性结果。结果并不比图像网络培训前的对应方更差,即使使用经过优化的用于微调预培训模型的基线系统(Mask R-CNN)超参数(Mask R-CNN)优化,但唯一例外是增加培训迭代次数,这样随机初始化模型就可以合并。随机初始化培训的结果令人惊讶地强劲;即使:(一) 仅使用10%的培训数据,(二) 更深和更广泛的模型,以及(三) 多重任务和计量标准,我们的结果仍然维持着。实验表明,在培训前的图像网络加快了培训早期的趋同速度,但不一定提供正规化或提高最终目标任务的准确性。要推进我们展示50.9个COCO物体探测的AP的封口,而不使用任何外部数据-结果,与使用图像网络培训前的COCO201717最高竞争结果相当。这些观察对图像网络培训前的常规智慧提出了挑战。我们期望这些发现将鼓励人们重新思考目前对计算机进行精确和微调。