Various models have been proposed to perform object detection. However, most require many handdesigned components such as anchors and non-maximum-suppression(NMS) to demonstrate good performance. To mitigate these issues, Transformer-based DETR and its variant, Deformable DETR, were suggested. These have solved much of the complex issue in designing a head for object detection models; however, doubts about performance still exist when considering Transformer-based models as state-of-the-art methods in object detection for other models depending on anchors and NMS revealed better results. Furthermore, it has been unclear whether it would be possible to build an end-to-end pipeline in combination only with attention modules, because the DETR-adapted Transformer method used a convolutional neural network (CNN) for the backbone body. In this study, we propose that combining several attention modules with our new Task Specific Split Transformer (TSST) is a powerful method to produce the state-of-the art performance on COCO results without traditionally hand-designed components. By splitting the general-purpose attention module into two separated goal-specific attention modules, the proposed method allows for the design of simpler object detection models. Extensive experiments on the COCO benchmark demonstrate the effectiveness of our approach. Code is available at https://github.com/navervision/tsst
翻译:提出了各种模型来进行物体探测。 但是,大多数模型需要许多手工设计的部件,如锚和非最大压缩(NMS)来显示良好的性能。为了缓解这些问题,建议了以变换器为基础的DETR及其变异(变异)DETR及其变异(变异)DETR。这些模型解决了设计物体探测模型头部的许多复杂问题;然而,如果将基于变异器的模型作为其他模型的物体探测的最先进方法,取决于锚和NMS显示更好的结果,对性能的怀疑仍然存在。此外,尚不清楚是否有可能将一般用途的注意模块与关注模块合并,因为DTR改换的变异变异变变变变变变变变变变变变变变的变异变异变异变异变异变异变异(CNN)为主机神经网络。在本研究中,我们建议将若干关注模块与我们新的任务特定变异变异变异(TST)变换器(TST)结合,是产生CO结果最新艺术性能表现的强大方法,而没有传统的手工设计组件。通过将一般目的注意模块拆分解一般目的模块,将一般注意模块分成两个目标对端管道的端管道管道管道管道管道管道管道管道管道管道管道管道连接,从而可以用来对主体测试。在主体测试中,因此,提议的计算机编码标准基准设计了用于设计。