Considering the functionality of situational awareness in safety-critical automation systems, the perception of risk in driving scenes and its explainability is of particular importance for autonomous and cooperative driving. Toward this goal, this paper proposes a new research direction of joint risk localization in driving scenes and its risk explanation as a natural language description. Due to the lack of standard benchmarks, we collected a large-scale dataset, DRAMA (Driving Risk Assessment Mechanism with A captioning module), which consists of 17,785 interactive driving scenarios collected in Tokyo, Japan. Our DRAMA dataset accommodates video- and object-level questions on driving risks with associated important objects to achieve the goal of visual captioning as a free-form language description utilizing closed and open-ended responses for multi-level questions, which can be used to evaluate a range of visual captioning capabilities in driving scenarios. We make this data available to the community for further research. Using DRAMA, we explore multiple facets of joint risk localization and captioning in interactive driving scenarios. In particular, we benchmark various multi-task prediction architectures and provide a detailed analysis of joint risk localization and risk captioning. The data set is available at https://usa.honda-ri.com/drama
翻译:考虑到安全-关键自动化系统中情况意识的功能,对驾驶场风险的认识及其解释对于自主和合作驱动特别重要。为实现这一目标,本文件提出一个新的研究方向,即在驾驶场进行联合风险定位,并将风险解释作为一种自然语言描述。由于缺乏标准基准,我们收集了大规模数据集DRAMA(驱动风险评估机制,配有说明模块),其中包括在日本东京收集的17 785个互动式驾驶场景。我们的DRAMA数据集包含与相关重要物体有关的驱动风险视频和目标级问题,目的是利用对多层次问题的封闭和开放式答复,实现将视觉字幕描述作为自由形式语言描述的目标。这些数据可用于评价一系列驱动场景的视觉描述能力。我们向社区提供这些数据,供进一步研究。我们利用DRAMAA,探讨在交互式驾驶场景中联合风险定位和说明的多个方面。我们特别对各种多任务预测架构进行基准,并提供对联合风险本地化和风险解释的详细分析。数据集在http://https-armas。