Accurate, long-term forecasting of human pedestrian trajectories in highly dynamic and interactive scenes is a long-standing challenge. Recent advances in using data-driven approaches have achieved significant improvements in terms of prediction accuracy. However, the lack of group-aware analysis has limited the performance of forecasting models. This is especially apparent in highly populated scenes, where pedestrians are moving in groups and the interactions between groups are extremely complex and dynamic. In this paper, we present Grouptron, a multi-scale dynamic forecasting framework that leverages pedestrian group detection and utilizes individual-level, group-level, and scene-level information for better understanding and representation of the scenes. Our approach employs spatio-temporal clustering algorithms to identify pedestrian groups, creates spatio-temporal graphs at the individual, group, and scene levels. It then uses graph neural networks to encode dynamics at different scales and incorporates encoding across different scales for trajectory prediction. We carried out extensive comparisons and ablation experiments to demonstrate the effectiveness of our approach. Our method achieves 9.3% decrease in final displacement error (FDE) compared with state-of-the-art methods on ETH/UCY benchmark datasets, and 16.1% decrease in FDE in more crowded scenes where extensive human group interactions are more frequently present.
翻译:在高度动态和互动的场景中,人类行人行道轨迹的准确、长期预测是一个长期的挑战。在使用数据驱动方法方面最近取得的进展在预测准确性方面取得了显著的改进。然而,缺乏群体意识分析限制了预测模型的性能。这在人口稠密的场景中特别明显,行人以群体方式移动,各群体之间的相互作用极为复杂和动态。在本文件中,我们介绍GroupTron,一个利用行人群体探测和利用个人、群体一级和场景一级信息的多尺度动态预测框架,以更好地了解和展示场景。我们的方法采用时空组合算法来识别行人群体,在个人、群体和场景一级创建时空图,从而在个人、群体和场景一级创建时空图。然后,它使用图形神经网络来解析不同尺度的动态,并纳入不同尺度的编码,以显示我们的方法的有效性。我们的方法实现了最后流离失所错误(FDE)的9.3%的减少率,而目前FDE1 和FAS-D-CS-B-B-LM 的更密集的模型中的数据正在不断下降。