The complex nature of combining localization and classification in object detection has resulted in the flourished development of methods. Previous works tried to improve the performance in various object detection heads but failed to present a unified view. In this paper, we present a novel dynamic head framework to unify object detection heads with attentions. By coherently combining multiple self-attention mechanisms between feature levels for scale-awareness, among spatial locations for spatial-awareness, and within output channels for task-awareness, the proposed approach significantly improves the representation ability of object detection heads without any computational overhead. Further experiments demonstrate that the effectiveness and efficiency of the proposed dynamic head on the COCO benchmark. With a standard ResNeXt-101-DCN backbone, we largely improve the performance over popular object detectors and achieve a new state-of-the-art at 54.0 AP. Furthermore, with latest transformer backbone and extra data, we can push current best COCO result to a new record at 60.6 AP. The code will be released at https://github.com/microsoft/DynamicHead.
翻译:将物体探测的本地化和分类结合起来的复杂性质已导致方法的蓬勃发展。以前的工作曾试图改进各种物体探测头的性能,但未能提出统一的观点。在本文件中,我们提出了一个新的动态头框架,将物体探测头与注意力统一起来。我们通过一致地将规模认知特征、空间认识空间位置和任务认知输出渠道之间的多重自留机制结合起来,拟议的方法大大提高了物体探测头在没有任何计算间接费用的情况下的代表性能力。进一步的实验表明,拟议的COCOCO基准动态头的效能和效率。用标准的ResNeXt-101-DCN骨干,我们大大改进了流行物体探测器的性能,在540 AP上实现了新的状态。此外,利用最新的变压器骨干和额外数据,我们可以将目前最佳COCO结果推向60.6 AP的新记录。该代码将在https://github.com/microcolft/Dynmicheadhead发布。