Object detection with Transformers (DETR) has achieved a competitive performance over traditional detectors, such as Faster R-CNN. However, the potential of DETR remains largely unexplored for the more challenging task of arbitrary-oriented object detection problem. We provide the first attempt and implement Oriented Object DEtection with TRansformer ($\bf O^2DETR$) based on an end-to-end network. The contributions of $\rm O^2DETR$ include: 1) we provide a new insight into oriented object detection, by applying Transformer to directly and efficiently localize objects without a tedious process of rotated anchors as in conventional detectors; 2) we design a simple but highly efficient encoder for Transformer by replacing the attention mechanism with depthwise separable convolution, which can significantly reduce the memory and computational cost of using multi-scale features in the original Transformer; 3) our $\rm O^2DETR$ can be another new benchmark in the field of oriented object detection, which achieves up to 3.85 mAP improvement over Faster R-CNN and RetinaNet. We simply fine-tune the head mounted on $\rm O^2DETR$ in a cascaded architecture and achieve a competitive performance over SOTA in the DOTA dataset.
翻译:使用变换器探测物体(DETR)已经取得了与传统探测器(如更快R-CNN)相比的竞争性性能。然而,对任意导向物体探测问题这一更具挑战性的任务,DER的潜力基本上尚未探索出。我们首次尝试,并在终端到终端网络的基础上与TRansforexer((bf O ⁇ 2DETR$)实施定向天体探测仪(O ⁇ 2DETR$)),在端到端网络的基础上实施定向天体探测($rm O ⁇ 2DETR$),包括:1)我们为定向天体探测提供了一种新的认识,通过应用变换器直接和高效率地将物体本地化,而没有像常规探测器那样的旋转锚的繁琐过程;2)我们设计了一个简单但高效的变换码器,用深度可分解的共振动器取代注意机制,这可以大大减少使用原始变换器中多尺度特征的记忆和计算成本。3)我们的美元O ⁇ 2DETRTRTR$可以成为定向天体探测领域的另一个新基准,在较快的R-CNN和RetinnetNet上实现3.85 mAP改进到较快的R-DE$DATA,我们在O-DASestraptailmatal 的O-taxxxestruptalmax 的升级结构。我们只是了O-taxxxxxxxx的升级的升级了SU。