翻译、规模和轮换:跨模式调整符合RGB-红外车辆探测 (Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection)

Integrating multispectral data in object detection, especially visible and infrared images, has received great attention in recent years. Since visible (RGB) and infrared (IR) images can provide complementary information to handle light variations, the paired images are used in many fields, such as multispectral pedestrian detection, RGB-IR crowd counting and RGB-IR salient object detection. Compared with natural RGB-IR images, we find detection in aerial RGB-IR images suffers from cross-modal weakly misalignment problems, which are manifested in the position, size and angle deviations of the same object. In this paper, we mainly address the challenge of cross-modal weakly misalignment in aerial RGB-IR images. Specifically, we firstly explain and analyze the cause of the weakly misalignment problem. Then, we propose a Translation-Scale-Rotation Alignment (TSRA) module to address the problem by calibrating the feature maps from these two modalities. The module predicts the deviation between two modality objects through an alignment process and utilizes Modality-Selection (MS) strategy to improve the performance of alignment. Finally, a two-stream feature alignment detector (TSFADet) based on the TSRA module is constructed for RGB-IR object detection in aerial images. With comprehensive experiments on the public DroneVehicle datasets, we verify that our method reduces the effect of the cross-modal misalignment and achieve robust detection results.

翻译：由于可见(RGBB)和红外(IR)图像能够提供补充信息,处理光变,因此在多个领域使用配对图像,例如多光谱行人探测、RGB-IR人群计数和RGB-IR突出对象探测。与自然 RGB-IR 图像相比,我们在空中的 RGB-IR 图像中发现多光谱数据存在跨模式差错问题,这表现在同一物体的位置、大小和角度偏差上。在本文件中,我们主要处理航空RGB-IR 图像中跨模式差错的挑战。具体地说,我们首先解释和分析差错差问题的原因。然后,我们提议一个翻译-比例调整模块(TSR),通过校准这两个模式的地貌地图来解决这一问题。模块预测两个模式对象之间的稳妥偏差,通过一个校准进程并利用Modal-Stection(MS)战略来改进空格RGB-IR 图像的交叉错错乱状态。最后,我们首先解释并分析并分析该模型在空中测试的 RFA 模型中进行双流测试。