We propose a novel single shot object detection network named Detection with Enriched Semantics (DES). Our motivation is to enrich the semantics of object detection features within a typical deep detector, by a semantic segmentation branch and a global activation module. The segmentation branch is supervised by weak segmentation ground-truth, i.e., no extra annotation is required. In conjunction with that, we employ a global activation module which learns relationship between channels and object classes in a self-supervised manner. Comprehensive experimental results on both PASCAL VOC and MS COCO detection datasets demonstrate the effectiveness of the proposed method. In particular, with a VGG16 based DES, we achieve an mAP of 81.7 on VOC2007 test and an mAP of 32.8 on COCO test-dev with an inference speed of 31.5 milliseconds per image on a Titan Xp GPU. With a lower resolution version, we achieve an mAP of 79.7 on VOC2007 with an inference speed of 13.0 milliseconds per image.
翻译:我们建议建立一个名为“用精度语义学探测”的新颖的单一射线物体探测网络(DES)。我们的动机是通过一个语义分解分支和一个全球激活模块,在一个典型的深海探测器中,用一个语义分解分支和一个全球激活模块来丰富物体探测特征的语义。分解分支由薄弱的分解地面真实性(即不需要额外注解)来监督。与此同时,我们使用一个全球激活模块,以自我监督的方式学习频道和对象类别之间的关系。PACAL VOC和MS COCO检测数据集的综合实验结果证明了拟议方法的有效性。特别是,通过一个基于 VGG16 的DES,我们在VOC2007 测试中实现了81.7 MAP, 在 CO 测试中实现了32.8 MAAP, 推引力速度为31.5毫秒/图像在Titan Xp GPPU上实现79.7 mAP, 推算速度为13.0毫秒。我们用较低分辨率版本,在VOC2007年VOC7上实现了79.7的 mAP,推算速度为13.0毫秒。