In the past few years, we have witnessed rapid development of autonomous driving. However, achieving full autonomy remains a daunting task due to the complex and dynamic driving environment. As a result, self-driving cars are equipped with a suite of sensors to conduct robust and accurate environment perception. As the number and type of sensors keep increasing, combining them for better perception is becoming a natural trend. So far, there has been no indepth review that focuses on multi-sensor fusion based perception. To bridge this gap and motivate future research, this survey devotes to review recent fusion-based 3D detection deep learning models that leverage multiple sensor data sources, especially cameras and LiDARs. In this survey, we first introduce the background of popular sensors for autonomous cars, including their common data representations as well as object detection networks developed for each type of sensor data. Next, we discuss some popular datasets for multi-modal 3D object detection, with a special focus on the sensor data included in each dataset. Then we present in-depth reviews of recent multi-modal 3D detection networks by considering the following three aspects of the fusion: fusion location, fusion data representation, and fusion granularity. After a detailed review, we discuss open challenges and point out possible solutions. We hope that our detailed review can help researchers to embark investigations in the area of multi-modal 3D object detection.
翻译:在过去几年里,我们目睹了自主驾驶的迅速发展,然而,由于驱动环境复杂而充满活力,实现完全自主仍是一项艰巨的任务。因此,自驾汽车配备了一套感应器,以进行稳健和准确的环境感知。随着感应器的数量和类型不断增加,将感应器结合在一起,以更好的感知正在成为一个自然趋势。到目前为止,还没有进行深入的审查,以多传感器为主的感应感应点;为了缩小这一差距并激发未来的研究,这项调查专门审查利用多种感应数据源,特别是相机和LIDARs的基于聚合的3D探测深层模型。在这次调查中,我们首先介绍了自驾车汽车流行感应器的背景,包括它们的共同数据表示和为每一种感应数据开发的物体探测网络。接着,我们讨论了多式3D对象探测的一些流行数据集,特别侧重于每个数据集所包含的感应数据。然后我们通过考虑以下三个方面对最近的多式3D探测网络进行深入的审查:即振荡位置后,我们可能进行的详细的探查。