Mainstream object detectors based on the fully convolutional network has achieved impressive performance. While most of them still need a hand-designed non-maximum suppression (NMS) post-processing, which impedes fully end-to-end training. In this paper, we give the analysis of discarding NMS, where the results reveal that a proper label assignment plays a crucial role. To this end, for fully convolutional detectors, we introduce a Prediction-aware One-To-One (POTO) label assignment for classification to enable end-to-end detection, which obtains comparable performance with NMS. Besides, a simple 3D Max Filtering (3DMF) is proposed to utilize the multi-scale features and improve the discriminability of convolutions in the local region. With these techniques, our end-to-end framework achieves competitive performance against many state-of-the-art detectors with NMS on COCO and CrowdHuman datasets. The code is available at https://github.com/Megvii-BaseDetection/DeFCN .
翻译:以全演化网络为基础的主流物体探测器取得了令人印象深刻的性能。 虽然大多数探测器仍需要手工设计的非最大抑制(NMS)后处理,这完全妨碍了端到端培训。 在本文件中,我们分析了丢弃NMS的情况,结果显示适当的标签分配具有关键作用。为此,对于全演化探测器来说,我们引入了一个预测-觉醒一对一标签(POTO)分类分配,以便能够进行端到端的检测,从而获得与NMS类似的性能。此外,还提议了一个简单的 3D 最大过滤(3DMF), 以利用多级特征,改善当地地区演动的可调和性。有了这些技术,我们的端到端框架能够对许多最先进的探测器进行竞争性能表现,与关于COCO和CrowdHuman数据集的NMS进行竞争。该代码可在 http://github.com/Megvii-Basesectionion/DeFCN上查阅。