Multi-modal fusion is a fundamental task for the perception of an autonomous driving system, which has recently intrigued many researchers. However, achieving a rather good performance is not an easy task due to the noisy raw data, underutilized information, and the misalignment of multi-modal sensors. In this paper, we provide a literature review of the existing multi-modal-based methods for perception tasks in autonomous driving. Generally, we make a detailed analysis including over 50 papers leveraging perception sensors including LiDAR and camera trying to solve object detection and semantic segmentation tasks. Different from traditional fusion methodology for categorizing fusion models, we propose an innovative way that divides them into two major classes, four minor classes by a more reasonable taxonomy in the view of the fusion stage. Moreover, we dive deep into the current fusion methods, focusing on the remaining problems and open-up discussions on the potential research opportunities. In conclusion, what we expect to do in this paper is to present a new taxonomy of multi-modal fusion methods for the autonomous driving perception tasks and provoke thoughts of the fusion-based techniques in the future.
翻译:多式融合对于认识自主驱动系统来说是一项基本任务,它最近吸引了许多研究人员。然而,实现相当良好的业绩并不是一项容易的任务,因为原始数据繁琐,信息利用不足,而且多式传感器不协调。在本文中,我们从文献角度审视了现有基于多种模式的自主驱动的认知任务方法。一般来说,我们进行详细分析,包括50多份文件,利用包括LIDAR和照相机在内的感知传感器试图解决物体探测和语义分解任务。不同于传统的对聚合模型进行分类的聚合方法,我们提出一种创新方法,将它们分为两大类,四个小类,从聚化阶段来看,以更合理的分类法划分为四个小类。此外,我们深入探讨目前的聚合方法,侧重于剩余的问题,并就潜在的研究机会进行公开讨论。最后,我们期望本文中要做的是提出一种对未来自主驱动感知任务和激发聚变技术思维的多式融合方法进行新的分类。