Large-scale deployment of autonomous vehicles has been continually delayed due to safety concerns. On the one hand, comprehensive scene understanding is indispensable, a lack of which would result in vulnerability to rare but complex traffic situations, such as the sudden emergence of unknown objects. However, reasoning from a global context requires access to sensors of multiple types and adequate fusion of multi-modal sensor signals, which is difficult to achieve. On the other hand, the lack of interpretability in learning models also hampers the safety with unverifiable failure causes. In this paper, we propose a safety-enhanced autonomous driving framework, named Interpretable Sensor Fusion Transformer(InterFuser), to fully process and fuse information from multi-modal multi-view sensors for achieving comprehensive scene understanding and adversarial event detection. Besides, intermediate interpretable features are generated from our framework, which provide more semantics and are exploited to better constrain actions to be within the safe sets. We conducted extensive experiments on CARLA benchmarks, where our model outperforms prior methods, ranking the first on the public CARLA Leaderboard. Our code will be made available at https://github.com/opendilab/InterFuser
翻译:一方面,全面的现场了解是不可或缺的,缺乏这种了解会导致容易发生罕见但复杂的交通情况,例如突然出现不明物体;然而,全球背景的推理要求接触多种类型的传感器,并充分结合难以实现的多式传感器信号;另一方面,学习模式缺乏可解释性,也妨碍了安全,造成无法核实的故障原因;在本文件中,我们提议一个安全强化的自主驾驶框架,即名为Interprecable Sensor Fusion 变压器(InterFuser),以便从多式多视图传感器充分处理和连接信息,以便实现全面的现场了解和对抗性事件探测;此外,我们的框架产生了中间可解释性特征,提供了更多的语义学,并被用来更好地限制在安全区内的行动;我们在CARLA基准上进行了广泛的实验,我们的模型比以前的方法更精确,在公共的 CARLA头板上排第一位。我们的代码将在https://githususer/Infoperub.commation上提供。