Recently, deep-learning based approaches have achieved impressive performance for autonomous driving. However, end-to-end vision-based methods typically have limited interpretability, making the behaviors of the deep networks difficult to explain. Hence, their potential applications could be limited in practice. To address this problem, we propose an interpretable end-to-end vision-based motion planning approach for autonomous driving, referred to as IVMP. Given a set of past surrounding-view images, our IVMP first predicts future egocentric semantic maps in bird's-eye-view space, which are then employed to plan trajectories for self-driving vehicles. The predicted future semantic maps not only provide useful interpretable information, but also allow our motion planning module to handle objects with low probability, thus improving the safety of autonomous driving. Moreover, we also develop an optical flow distillation paradigm, which can effectively enhance the network while still maintaining its real-time performance. Extensive experiments on the nuScenes dataset and closed-loop simulation show that our IVMP significantly outperforms the state-of-the-art approaches in imitating human drivers with a much higher success rate. Our project page is available at https://sites.google.com/view/ivmp.
翻译:最近,基于深层次学习的方法在自主驾驶方面取得了令人印象深刻的成绩。然而,基于端到端视觉的方法通常具有有限的可解释性,使深层网络的行为难以解释。因此,其潜在应用可能在实践中受到限制。为了解决这一问题,我们提议了一种适用于自主驾驶的、可解释的端到端基于愿景的运动规划方法,称为IVMP。鉴于过去一系列环景图像,我们的IVMP首先预测了鸟眼视觉空间的未来以自我为中心的语义地图,然后用于规划自驾驶飞行器的轨迹。预测的未来语义图不仅提供了有用的可解释信息,而且还使我们的运动规划模块能够处理低概率的物体,从而改进自主驾驶的安全性。此外,我们还开发了一个光学流蒸馏模型,这可以有效地加强网络,同时保持其实时性能。在Nus-Scenes数据集和闭路模拟中进行的广泛实验显示,我们的IVMP大大地超越了在模仿人类驾驶器方面采用状态的可解释性方法。在模仿MLA/MLA/CLA中可以使用的成功率高得多。