预测行人意图的局部和全局上下文特征融合 (Local and Global Contextual Features Fusion for Pedestrian Intention Prediction)

Autonomous vehicles (AVs) are becoming an indispensable part of future transportation. However, safety challenges and lack of reliability limit their real-world deployment. Towards boosting the appearance of AVs on the roads, the interaction of AVs with pedestrians including "prediction of the pedestrian crossing intention" deserves extensive research. This is a highly challenging task as involves multiple non-linear parameters. In this direction, we extract and analyse spatio-temporal visual features of both pedestrian and traffic contexts. The pedestrian features include body pose and local context features that represent the pedestrian's behaviour. Additionally, to understand the global context, we utilise location, motion, and environmental information using scene parsing technology that represents the pedestrian's surroundings, and may affect the pedestrian's intention. Finally, these multi-modality features are intelligently fused for effective intention prediction learning. The experimental results of the proposed model on the JAAD dataset show a superior result on the combined AUC and F1-score compared to the state-of-the-art.

翻译：自主车辆（AV）正在成为未来交通的必不可少的部分。然而，安全挑战和缺乏可靠性限制了它们在实际环境中的部署。为了提高AV在道路上的表现，AV与行人的交互，包括“预测行人的过马路意图”，值得广泛研究。这是一个高度具有挑战性的任务，因为涉及多个非线性参数。在这个方向上，我们提取并分析行人和交通环境的时空视觉特征。行人特征包括身体姿态和局部上下文特征，表示行人的行为。此外，为了理解全局上下文，我们利用场景解析技术使用位置，运动和环境信息，表示行人的周围环境可能会影响行人的意图。最后，这些多模态特征被智能融合以有效地预测学习意图。在JAAD数据集上的实验结果表明，该模型相对于现有技术具有卓越的组合AUC和F1-score性能。