The ImageNet pre-training initialization is the de-facto standard for object detection. He et al. found it is possible to train detector from scratch(random initialization) while needing a longer training schedule with proper normalization technique. In this paper, we explore to directly pre-training on target dataset for object detection. Under this situation, we discover that the widely adopted large resizing strategy e.g. resize image to (1333, 800) is important for fine-tuning but it's not necessary for pre-training. Specifically, we propose a new training pipeline for object detection that follows `pre-training and fine-tuning', utilizing low resolution images within target dataset to pre-training detector then load it to fine-tuning with high resolution images. With this strategy, we can use batch normalization(BN) with large bath size during pre-training, it's also memory efficient that we can apply it on machine with very limited GPU memory(11G). We call it direct detection pre-training, and also use direct pre-training for short. Experiment results show that direct pre-training accelerates the pre-training phase by more than 11x on COCO dataset while with even +1.8mAP compared to ImageNet pre-training. Besides, we found direct pre-training is also applicable to transformer based backbones e.g. Swin Transformer. Code will be available.
翻译:图像网络 培训前 初始化 图像网络 培训前 初始化 是 目标检测 的 de- facto 标准 。 他等人认为 有可能从零开始( 随机初始化) 培训检测器, 同时需要更长时间的培训时间表 。 在本文件中, 我们探索直接为目标数据集培训前, 用于目标检测 。 在此情况下, 我们发现广泛采用的大规模调整战略, 如将图像缩小到 (1333, 800) 对微调很重要, 但对于培训前的检测来说并不必要 。 具体地说, 我们提议在“ 培训前和微调” 之后, 使用目标数据集中的低分辨率图像进行培训。 然后用高分辨率图像进行装入 。 有了这个战略, 我们可以在培训前使用大型浴室的批次常规化( BNB) 。 在GPU记忆非常有限的机器( 11G) 上, 我们称之为直接检测前训练, 并使用直接训练前训练。 实验结果显示, 直接培训前的eg eg 图像在培训前的升级阶段比Sx 直接的变压阶段要快。