The pedestrian intention prediction problem is to estimate whether or not the target pedestrian will cross the street. State-of-the-art approaches heavily rely on visual information collected with the front camera of the ego-vehicle to make a prediction of the pedestrian's intention. As such, the performance of existing methods significantly degrades when the visual information is not accurate, e.g., when the distance between the pedestrian and ego-vehicle is far, or the lighting conditions are not good enough. In this paper, we design, implement, and evaluate the first pedestrian intention prediction model based on integration of motion sensor data gathered with the smartwatch (or smartphone) of the pedestrian. A novel machine learning architecture is proposed to effectively incorporate the motion sensor data to reinforce the visual information to significantly improve the performance in adverse situations where the visual information may be unreliable. We also conduct a large-scale data collection and present the first pedestrian intention prediction dataset integrated with time-synchronized motion sensor data. The dataset consists of a total of 128 video clips with different distances and varying levels of lighting conditions. We trained our model using the widely-used JAAD and our own datasets and compare the performance with a state-of-the-art model. The results demonstrate that our model outperforms the state-of-the-art method particularly when the distance to the pedestrian is far (over 70m), and the lighting conditions are not sufficient.
翻译:行人意图预测问题在于估计目标行人是否会过街。最先进的方法主要依靠用自我车前摄像头收集的视觉信息来预测行人的意图。因此,当视觉信息不准确时,现有方法的性能会显著下降,例如行人与自我车距离遥远,或照明条件不够好。在本文中,我们设计、实施和评估了第一个行人意图预测模型,该模型基于与行人智能观察(或智能手机)收集的运动感应数据相结合。建议建立一个新型机器学习结构,有效地纳入运动感知数据,以加强视觉信息,大大改善视觉信息不可靠的不利情况下的性能。我们还进行大规模数据收集,并介绍与时间同步运动传感器数据相结合的第一行人意图预测数据集。数据集由总共128个视频剪组成,不同距离和不同程度的照明条件。我们用广泛使用的JAAAD模型和我们自己的远距路程模型来培训我们的模型。在70度模型和我们自己的远距路程模型显示的状态时,我们自己的模型和远距路程模型是特别展示的状态。