Intersections where vehicles are permitted to turn and interact with vulnerable road users (VRUs) like pedestrians and cyclists are among some of the most challenging locations for automated and accurate recognition of road users' behavior. In this paper, we propose a deep conditional generative model for interaction detection at such locations. It aims to automatically analyze massive video data about the continuity of road users' behavior. This task is essential for many intelligent transportation systems such as traffic safety control and self-driving cars that depend on the understanding of road users' locomotion. A Conditional Variational Auto-Encoder based model with Gaussian latent variables is trained to encode road users' behavior and perform probabilistic and diverse predictions of interactions. The model takes as input the information of road users' type, position and motion automatically extracted by a deep learning object detector and optical flow from videos, and generates frame-wise probabilities that represent the dynamics of interactions between a turning vehicle and any VRUs involved. The model's efficacy was validated by testing on real--world datasets acquired from two different intersections. It achieved an F1-score above 0.96 at a right--turn intersection in Germany and 0.89 at a left--turn intersection in Japan, both with very busy traffic flows.
翻译:允许车辆与行人和骑自行车者等弱势道路使用者(VRUs)进行交接和互动的交叉路段,是自动和准确地识别道路使用者行为的一些最具挑战性的地点。在本文中,我们提出一个在这类地点进行互动检测的深度有条件的遗传模型。目的是自动分析关于道路使用者行为的连续性的大量视频数据。这项任务对于许多智能运输系统至关重要,例如交通安全控制和自行驾驶汽车等取决于对道路使用者行动轨迹的理解的智能运输系统。一个基于高斯潜伏变量的自动自动编码模型经过培训,以编码道路使用者的行为,并对互动进行概率性和多样性的预测。该模型输入了道路使用者类型、位置和运动的信息,由深学习物体探测器和视频光学流自动提取。这一任务对于许多智能运输系统(如交通安全控制和自行驾驶汽车)至关重要。该模型的功效通过测试从两个不同的十字路交叉点上获得的现实世界数据集而得到验证。该模型在两条不同的交叉点上,在日本的十字路路路路口实现了0.9路路段,在德国的十字路路路口实现了0.96。