Point clouds and images could provide complementary information when representing 3D objects. Fusing the two kinds of data usually helps to improve the detection results. However, it is challenging to fuse the two data modalities, due to their different characteristics and the interference from the non-interest areas. To solve this problem, we propose a Multi-Branch Deep Fusion Network (MBDF-Net) for 3D object detection. The proposed detector has two stages. In the first stage, our multi-branch feature extraction network utilizes Adaptive Attention Fusion (AAF) modules to produce cross-modal fusion features from single-modal semantic features. In the second stage, we use a region of interest (RoI) -pooled fusion module to generate enhanced local features for refinement. A novel attention-based hybrid sampling strategy is also proposed for selecting key points in the downsampling process. We evaluate our approach on two widely used benchmark datasets including KITTI and SUN-RGBD. The experimental results demonstrate the advantages of our method over state-of-the-art approaches.
翻译:代表 3D 对象时, 点云和图像可以提供补充信息 。 使用两种数据通常有助于改进检测结果 。 但是, 由于两种数据模式的不同特点和来自非利益区的干扰, 要将这两种数据模式结合起来是很困难的。 为了解决这个问题, 我们提议为 3D 对象探测建立一个多点深层融合网络( MBDF- Net ) 。 提议的探测器分两个阶段。 在第一阶段, 我们的多处特征提取网络使用适应性注意力聚合( AAAAF) 模块来生成单式语义特征的跨模式融合特征 。 在第二阶段, 我们使用一个感兴趣的区域( ROI) 集合聚合模块来生成强化的本地特征以供改进 。 我们还提出了一个新的关注性混合采样战略, 用于选择下标过程中的关键点 。 我们评估了我们两个广泛使用的基准数据集的方法, 包括 KITTI 和 SUN- RGBBD 。 实验结果显示了我们的方法相对于 状态方法的优势 。