In this paper, we first investigate why typical two-stage methods are not as fast as single-stage, fast detectors like YOLO and SSD. We find that Faster R-CNN and R-FCN perform an intensive computation after or before RoI warping. Faster R-CNN involves two fully connected layers for RoI recognition, while R-FCN produces a large score maps. Thus, the speed of these networks is slow due to the heavy-head design in the architecture. Even if we significantly reduce the base model, the computation cost cannot be largely decreased accordingly. We propose a new two-stage detector, Light-Head R-CNN, to address the shortcoming in current two-stage approaches. In our design, we make the head of network as light as possible, by using a thin feature map and a cheap R-CNN subnet (pooling and single fully-connected layer). Our ResNet-101 based light-head R-CNN outperforms state-of-art object detectors on COCO while keeping time efficiency. More importantly, simply replacing the backbone with a tiny network (e.g, Xception), our Light-Head R-CNN gets 30.7 mmAP at 102 FPS on COCO, significantly outperforming the single-stage, fast detectors like YOLO and SSD on both speed and accuracy. Code will be made publicly available.
翻译:在本文中,我们首先调查为什么典型的两阶段方法没有象YOLO和SSD这样的单级快速探测器那么快。我们发现更快的R-CNN和R-FCN在ROI扭曲之后或之前进行密集计算。更快的R-CNN涉及两个完全连接的层次,以便进行ROI识别,而R-FCN则制作一个大分数图。因此,这些网络的速度由于建筑结构中的重头设计而缓慢。即使我们大幅降低基准模型,计算成本也不可能相应大幅降低。我们建议一个新的两阶段探测器,即光头R-CNN,以解决目前两阶段方法的缺陷。在我们的设计中,我们使用薄的地势地图和廉价的R-CN子网(集中和单一的完全连接层),尽可能地使网络的首部尽可能亮。我们基于光头R-CNN的远方位物体探测器在COCO上进行测算,同时保持时间效率。更重要的是,只要用小型的网络(eg、Xcepen-CN)取代主干网,在102级的S-RO-S-S-S-S-S-S-SDSDS-SDG和快速的S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-P-S-P-P-P-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-P-P-P-S-S-S