Object detection on VHR remote sensing images plays a vital role in applications such as urban planning, land resource management, and rescue missions. The large-scale variation of the remote-sensing targets is one of the main challenges in VHR remote-sensing object detection. Existing methods improve the detection accuracy of high-resolution remote sensing objects by improving the structure of feature pyramids and adopting different attention modules. However, for small targets, there still be seriously missed detections due to the loss of key detail features. There is still room for improvement in the way of multiscale feature fusion and balance. To address this issue, this paper proposes two novel modules: Guided Attention and Tucker Bilinear Attention, which are applied to the stages of early fusion and late fusion respectively. The former can effectively retain clean key detail features, and the latter can better balance features through semantic-level correlation mining. Based on two modules, we build a new multi-scale remote sensing object detection framework. No bells and whistles. The proposed method largely improves the average precisions of small objects and achieves the highest mean average precisions compared with 9 state-of-the-art methods on DOTA, DIOR, and NWPU VHR-10.Code and models are available at https://github.com/Shinichict/GTNet.
翻译:VHR遥感图像上的高分辨率遥感物体探测在城市规划、土地资源管理和救援任务等应用中发挥着至关重要的作用。遥感目标的大规模变异是VHR遥感物体探测的主要挑战之一。现有方法通过改进地貌金字塔结构和采用不同的关注模块,提高高分辨率遥感物体的探测准确性。然而,对于小目标而言,由于关键细节特征的丧失,仍然严重漏掉探测工作。在多尺度特性融合和平衡方面仍有改进的余地。为解决这一问题,本文件提出了两个新的模块:引导注意和塔克双线关注,分别用于早期聚变和延迟聚变阶段。前者可以有效保留清洁的关键细节特征,而后者可以通过语义-级别相关采矿更好地保持平衡特征。基于两个模块,我们建立了一个新的多尺度遥感物体探测框架。无钟声和哨声。拟议的方法大大改进了小型物体的平均精确度,并实现了与9个州-GTS/NGST/ROFM 和9州-RODA的NGVS/C模型相比最高平均精确度。</s>