Sensor fusion is an essential topic in many perception systems, such as autonomous driving and robotics. Transformers-based detection head and CNN-based feature encoder to extract features from raw sensor-data has emerged as one of the best performing sensor-fusion 3D-detection-framework, according to the dataset leaderboards. In this work we provide an in-depth literature survey of transformer based 3D-object detection task in the recent past, primarily focusing on the sensor fusion. We also briefly go through the Vision transformers (ViT) basics, so that readers can easily follow through the paper. Moreover, we also briefly go through few of the non-transformer based less-dominant methods for sensor fusion for autonomous driving. In conclusion we summarize with sensor-fusion trends to follow and provoke future research. More updated summary can be found at: https://github.com/ApoorvRoboticist/Transformers-Sensor-Fusion
翻译:感应器聚合是许多感知系统(如自主驱动和机器人等)的一个基本主题。根据数据集头板,根据数据集显示,从原始传感器数据中提取特征的转换器检测头和CNN基于特征编码器,根据原始传感器数据,已成为最有效果的传感器聚变3D检测框架之一。在这项工作中,我们对最近基于3D粒子的变压器检测任务进行了深入的文献调查,主要侧重于传感器聚变。我们还简要地浏览了视觉变压器基础,以便读者能够很容易地了解文件内容。此外,我们还简要地研究了少数基于非转化器的不甚突出的自动驱动传感器聚变方法。最后,我们总结了将遵循的传感器聚变趋势,并激发了未来的研究。更多的最新摘要见:https://github.com/ApoorvRobtorist/Transfordes-Sensor-Fusion。