In this paper, we aim to design an efficient real-time object detector that exceeds the YOLO series and is easily extensible for many object recognition tasks such as instance segmentation and rotated object detection. To obtain a more efficient model architecture, we explore an architecture that has compatible capacities in the backbone and neck, constructed by a basic building block that consists of large-kernel depth-wise convolutions. We further introduce soft labels when calculating matching costs in the dynamic label assignment to improve accuracy. Together with better training techniques, the resulting object detector, named RTMDet, achieves 52.8% AP on COCO with 300+ FPS on an NVIDIA 3090 GPU, outperforming the current mainstream industrial detectors. RTMDet achieves the best parameter-accuracy trade-off with tiny/small/medium/large/extra-large model sizes for various application scenarios, and obtains new state-of-the-art performance on real-time instance segmentation and rotated object detection. We hope the experimental results can provide new insights into designing versatile real-time object detectors for many object recognition tasks. Code and models are released at https://github.com/open-mmlab/mmdetection/tree/3.x/configs/rtmdet.
翻译:在本文中,我们的目标是设计一个超过YOLO系列的高效实时物体探测器,该探测器很容易推广到许多物体识别任务,如例分解和旋转物体探测等。为了获得一个效率更高的模型结构,我们探索一个在骨架和颈部上具有兼容能力的建筑,由一个由大型内核深度变化构成的基本构件组成。我们在计算动态标签任务的成本时进一步引入软标签,以提高准确性。与更好的培训技术一起,所产生的物体探测器,名为RTMDet,在NVDIDIA 3090 GPU上以300+FPS在CO上达到52.8%的AP,比目前的主流工业探测器高300+FPS。RTMD在各种应用情景中以小/小型/中/大/特大模型大小模型实现最佳参数-准确性交易,并在实时实例分解和旋转物体探测方面获得新的状态-艺术性表现。我们希望实验结果能够为许多物体识别任务设计灵活实时的实时物体探测器提供新的洞见识。RVIDIA 303/degree/stream 代码和在各种物体识别任务上发布。