Traffic accident prediction in driving videos aims to provide an early warning of the accident occurrence, and supports the decision making of safe driving systems. Previous works usually concentrate on the spatial-temporal correlation of object-level context, while they do not fit the inherent long-tailed data distribution well and are vulnerable to severe environmental change. In this work, we propose a Cognitive Accident Prediction (CAP) method that explicitly leverages human-inspired cognition of text description on the visual observation and the driver attention to facilitate model training. In particular, the text description provides a dense semantic description guidance for the primary context of the traffic scene, while the driver attention provides a traction to focus on the critical region closely correlating with safe driving. CAP is formulated by an attentive text-to-vision shift fusion module, an attentive scene context transfer module, and the driver attention guided accident prediction module. We leverage the attention mechanism in these modules to explore the core semantic cues for accident prediction. In order to train CAP, we extend an existing self-collected DADA-2000 dataset (with annotated driver attention for each frame) with further factual text descriptions for the visual observations before the accidents. Besides, we construct a new large-scale benchmark consisting of 11,727 in-the-wild accident videos with over 2.19 million frames (named as CAP-DATA) together with labeled fact-effect-reason-introspection description and temporal accident frame label. Based on extensive experiments, the superiority of CAP is validated compared with state-of-the-art approaches. The code, CAP-DATA, and all results will be released in \url{https://github.com/JWFanggit/LOTVS-CAP}.
翻译:驾驶录像中的交通事故预测旨在为事故发生提供预警,并支持安全驾驶系统的决策。先前的工作通常侧重于目标级环境的空间-时间相关性,尽管它们不完全符合固有的长尾数据分布,而且容易发生严重的环境变化。在这项工作中,我们建议采用认知性事故预测(CAP)方法,明确利用人类引发的视觉观察文字描述的认知和驱动器关注,以促进示范培训。特别是,文本描述为交通场的主要背景提供了密集的语义描述指导,而驱动者关注则提供对关键区域与安全驾驶密切相关的牵引力,而它们又不符合固有的长尾数据分布,容易受到严重的环境变化变化变化的影响。在这项工作中,我们建议使用这些模块中的关注机制来探索用于事故预测的核心语义提示。为了培训CAP,我们扩展了现有的DADA-2000数据集(每个框架都有注释性驱动器关注),而驱动者关注点则以关键区域为重点。CAPLM-DAD(我们用直径直径的直径直径直径直径)的图像描述,在2个直径直径直径直径/直径直径直径的图像框架前,我们用直径直径直径直径直路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路。