To achieve accurate and robust object detection in the real-world scenario, various forms of images are incorporated, such as color, thermal, and depth. However, multimodal data often suffer from the position shift problem, i.e., the image pair is not strictly aligned, making one object has different positions in different modalities. For the deep learning method, this problem makes it difficult to fuse multimodal features and puzzles the convolutional neural network (CNN) training. In this article, we propose a general multimodal detector named aligned region CNN (AR-CNN) to tackle the position shift problem. First, a region feature (RF) alignment module with adjacent similarity constraint is designed to consistently predict the position shift between two modalities and adaptively align the cross-modal RFs. Second, we propose a novel region of interest (RoI) jitter strategy to improve the robustness to unexpected shift patterns. Third, we present a new multimodal feature fusion method that selects the more reliable feature and suppresses the less useful one via feature reweighting. In addition, by locating bounding boxes in both modalities and building their relationships, we provide novel multimodal labeling named KAIST-Paired. Extensive experiments on 2-D and 3-D object detection, RGB-T, and RGB-D datasets demonstrate the effectiveness and robustness of our method.
翻译:为了在现实世界情景中实现准确和稳健的物体探测,纳入了不同形式的图像,如颜色、热度和深度等,但多式联运数据往往受到位置变化问题的影响,即图像配对不严格对齐,使一个对象在不同模式上具有不同的位置。对于深层次学习方法,这一问题使得难以将多式联运特性结合起来,并混淆神经神经网络(CNN)的演进模式培训。在本篇文章中,我们提议采用一个名为统一区域CNN(AR-CNN)的通用多式联运探测器来解决位置变化问题。首先,一个具有相邻相似性制约的区域特征(RF)校准模块旨在持续预测两种模式之间的位置变化,并适应性地调整跨模式格式。第二,我们提出一个新的兴趣区域(ROI)策略,以提高对意想不到的变化模式变化模式的稳健性。第三,我们提出了一种新的多式联运特征融合方法,选择更可靠的特征,并通过特征重标注较不有用的区域CNN(AR-CNN)来解决定位问题。此外,通过在模式中定位框框和构建其相似性关系,我们提供了名为RGB-D的快速检测和3号数据测试方法。