Large-scale deployment of autonomous vehicles has been continually delayed due to safety concerns. On the one hand, comprehensive scene understanding is indispensable, a lack of which would result in vulnerability to rare but complex traffic situations, such as the sudden emergence of unknown objects. However, reasoning from a global context requires access to sensors of multiple types and adequate fusion of multi-modal sensor signals, which is difficult to achieve. On the other hand, the lack of interpretability in learning models also hampers the safety with unverifiable failure causes. In this paper, we propose a safety-enhanced autonomous driving framework, named Interpretable Sensor Fusion Transformer(InterFuser), to fully process and fuse information from multi-modal multi-view sensors for achieving comprehensive scene understanding and adversarial event detection. Besides, intermediate interpretable features are generated from our framework, which provide more semantics and are exploited to better constrain actions to be within the safe sets. We conducted extensive experiments on CARLA benchmarks, where our model outperforms prior methods, ranking the first on the public CARLA Leaderboard.
翻译:由于安全考虑,大规模部署自治车辆的工作不断拖延。一方面,全面的现场了解是不可或缺的,缺乏这种了解会导致容易发生罕见但复杂的交通情况,例如突然出现不明物体。然而,全球背景的推理要求接触多种类型的传感器,并充分融合难以实现的多式传感器信号。另一方面,学习模式缺乏可解释性也妨碍了安全,造成无法核实的故障。在本文件中,我们提议一个安全强化的自主驾驶框架,即名为Interprobilable Sensor Fusion 变异器(InterFuser),以便从多式多式传感器中充分处理和连接信息,从而实现全面的现场了解和对抗性事件探测。此外,中间可解释性特征来自我们的框架,提供了更多的语义,并被利用来更好地限制在安全区内的行动。我们在CARLA基准上进行了广泛的实验,我们的模型比先前的方法要好,将第一个模型排在公共的CARLA领头板上。