In the last decade, deep learning (DL) approaches have been used successfully in computer vision (CV) applications. However, DL-based CV models are generally considered to be black boxes due to their lack of interpretability. This black box behavior has exacerbated user distrust and therefore has prevented widespread deployment DLCV models in autonomous driving tasks even though some of these models exhibit superiority over human performance. For this reason, it is essential to develop explainable DL models for autonomous driving task. Explainable DL models can not only boost user trust in autonomy but also serve as a diagnostic approach to identify anydefects and weaknesses of the model during the system development phase. In this paper, we propose an explainable end-to-end autonomous driving system based on "Transformer", a state-of-the-art (SOTA) self-attention based model, to map visual features from images collected by onboard cameras to guide potential driving actions with corresponding explanations. The model achieves a soft attention over the global features of the image. The results demonstrate the efficacy of our proposed model as it exhibits superior performance (in terms of correct prediction of actions and explanations) compared to the benchmark model by a significant margin with lower computational cost.
翻译:在过去十年中,在计算机视觉(CV)应用中成功地应用了深层次学习(DL)方法,然而,基于DL的CV模型由于缺乏解释性,一般被认为是黑盒。这种黑盒行为加剧了用户不信任,因此阻止了在自主驾驶任务中广泛应用DLCV模型,尽管其中一些模型比人类性能优越。为此,有必要为自主驾驶任务开发可解释的DL模型。可解释的DL模型不仅可以提高用户对自主性的信任,还可以作为一种诊断性方法,用以查明在系统开发阶段该模型的任何缺陷和弱点。在本文件中,我们建议了一种基于“Transfor”(Troform)的可解释端到端自动驾驶系统,这是一种基于自我注意的状态(SOTA)模型,从机上收集的图像中绘制视觉特征图,以相应的解释来指导潜在的驱动行动。该模型在图像的全球特征上得到软化的关注。结果表明,我们提议的模型在显示优异性性表现(准确的行动预测和解释性能),而比标准要低得多。