Real-time generic object detection on mobile platforms is a crucial but challenging computer vision task. However, previous CNN-based detectors suffer from enormous computational cost, which hinders them from real-time inference in computation-constrained scenarios. In this paper, we investigate the effectiveness of two-stage detectors in real-time generic detection and propose a lightweight two-stage detector named ThunderNet. In the backbone part, we analyze the drawbacks in previous lightweight backbones and present a lightweight backbone designed for object detection. In the detection part, we exploit an extremely efficient RPN and detection head design. To generate more discriminative feature representation, we design two efficient architecture blocks, Context Enhancement Module and Spatial Attention Module. At last, we investigate the balance between the input resolution, the backbone, and the detection head. Compared with lightweight one-stage detectors, ThunderNet achieves superior performance with only 40% of the computational cost on PASCAL VOC and COCO benchmarks. Without bells and whistles, our model runs at 24.1 fps on an ARM-based device. To the best of our knowledge, this is the first real-time detector reported on ARM platforms. Code will be released for paper reproduction.
翻译:移动平台上的实时通用天体探测是一项关键但具有挑战性的计算机愿景任务。然而,先前的CNN探测器有巨大的计算成本,妨碍其在计算限制的情景中实时推断。在本文中,我们调查实时通用探测中两阶段探测器的有效性,并提议使用轻量级双级探测器,名为ThunderNet。在主干部分,我们分析前轻量级脊椎的缺陷,并提供一个用于物体探测的轻量级骨干。在探测部分,我们利用极高效的RPN和探测头设计。为产生更具有歧视性的特征,我们设计了两个高效的建筑块,即环境增强模块和空间关注模块。最后,我们调查输入分辨率、脊椎和探测头之间的平衡。与轻量级的一级探测器相比,SunderNet的性能更高,只有PACAL VOC和COCO基准的计算成本的40%。没有钟和哨,我们模型将在一个基于ARM的装置上运行24.1英尺的仪表。我们最了解的是,这是用于实时复制的密码平台。