Object detection has been extensively utilized in autonomous systems in recent years, encompassing both 2D and 3D object detection. Recent research in this field has primarily centered around multimodal approaches for addressing this issue.In this paper, a multimodal fusion approach based on result feature-level fusion is proposed. This method utilizes the outcome features generated from single modality sources, and fuses them for downstream tasks.Based on this method, a new post-fusing network is proposed for multimodal object detection, which leverages the single modality outcomes as features. The proposed approach, called Multi-Modal Detector based on Result features (MMDR), is designed to work for both 2D and 3D object detection tasks. Compared to previous multimodal models, the proposed approach in this paper performs feature fusion at a later stage, enabling better representation of the deep-level features of single modality sources. Additionally, the MMDR model incorporates shallow global features during the feature fusion stage, endowing the model with the ability to perceive background information and the overall input, thereby avoiding issues such as missed detections.
翻译:Translated abstract:
目标检测近年来在自主系统中得到了广泛使用,包括二维和三维目标检测。近期的研究主要集中在采用多模态方法来解决这个问题。本文提出了一种基于结果特征融合的多模态融合方法。该方法利用来自单模态源的结果特征,并将它们融合用于下游任务。并基于该方法提出了一种新的多模态目标检测后处理网络,利用单模态结果作为特征。该文章提出的方法称为基于结果特征的多模态检测器(MMDR),可用于二维和三维目标检测任务。与以前的多模态模型相比,该文中提出的方法在更后期阶段执行特征融合,能够更好地表示单模态源的深度特征。此外,在特征融合阶段,MMDR 模型还包含浅层全局特征,使该模型具有感知背景信息和整个输入的能力,从而避免出现漏检等问题。