Accurate, long-term forecasting of human pedestrian trajectories in highly dynamic and interactive scenes is a long-standing challenge. Recent advances in using data-driven approaches have achieved significant improvements in terms of prediction accuracy. However, the lack of group-aware analysis has limited the performance of forecasting models. This is especially apparent in highly populated scenes, where pedestrians are moving in groups and the interactions between groups are extremely complex and dynamic. In this paper, we present Grouptron, a multi-scale dynamic forecasting framework that leverages pedestrian group detection and utilizes individual-level, group-level, and scene-level information for better understanding and representation of the scenes. Our approach employs spatio-temporal clustering algorithms to identify pedestrian groups, creates spatio-temporal graphs at the individual, group, and scene levels. It then uses graph neural networks to encode dynamics at different scales and incorporates encoding across different scales for trajectory prediction. We carried out extensive comparisons and ablation experiments to demonstrate the effectiveness of our approach. Our method achieves a 9.3% decrease in final displacement error (FDE) compared with state-of-the-art methods on ETH/UCY benchmark datasets, and a 16.1% decrease in FDE in more crowded scenes where extensive human group interactions are more frequently present.
翻译:在高度动态和互动的场景中,人类行人轨迹的准确、长期预测是一个长期的挑战。在使用数据驱动方法方面最近取得的进展在预测准确性方面取得了显著的改进。然而,缺乏群体意识分析限制了预测模型的性能。这在人口密集的场景中特别明显,行人以群体方式移动,各群体之间的互动极为复杂和动态。在本文中,我们介绍Grouptron,一个利用行人群体探测和利用个人、群体一级和现场一级信息的多尺度动态预测框架,以更好地了解和展示场景。我们的方法采用时空组合算法来识别行人群体,在个人、群体和场景一级创建时空图,从而在个人、群体和场景一级创建时空图。然后,它使用图形神经网络来解析不同规模的动态,并纳入不同尺度的编码,以显示我们的方法的有效性。我们的方法实现了最终流离失所错误(FDE)的9.3%的减少,而目前FDE1 和FASM的更大规模互动方法则比目前的比例下降。