In this paper, we propose a novel self-supervised representation learning method, Self-EMD, for object detection. Our method directly trained on unlabeled non-iconic image dataset like COCO, instead of commonly used iconic-object image dataset like ImageNet. We keep the convolutional feature maps as the image embedding to preserve spatial structures and adopt Earth Mover's Distance (EMD) to compute the similarity between two embeddings. Our Faster R-CNN (ResNet50-FPN) baseline achieves 39.8% mAP on COCO, which is on par with the state of the art self-supervised methods pre-trained on ImageNet. More importantly, it can be further improved to 40.4% mAP with more unlabeled images, showing its great potential for leveraging more easily obtained unlabeled data. Code will be made available.
翻译:在本文中,我们提出一种新的自我监督的演示学习方法,即自我监控的物体探测方法。我们的方法直接训练于COCO等无标签的非气候图像数据集,而不是像图像网那样常用的图标-物体图像数据集。我们保留变动特征地图作为图像嵌入,以保存空间结构,并采用地球移动器距离(EMD)来计算两个嵌入器之间的相似性。我们的快速R-CNN(ResNet50-FPN)基线在COCO上达到了39.8%的 mAP,这与在图像网上预先培训过的艺术自监控方法相同。更重要的是,它可以进一步改进到40.4%的 mAP,使用更多未贴标签的图像,展示其利用更容易获得的无标签数据的巨大潜力。将会提供代码。