Joint forecasting of human trajectory and pose dynamics is a fundamental building block of various applications ranging from robotics and autonomous driving to surveillance systems. Predicting body dynamics requires capturing subtle information embedded in the humans' interactions with each other and with the objects present in the scene. In this paper, we propose a novel TRajectory and POse Dynamics (nicknamed TRiPOD) method based on graph attentional networks to model the human-human and human-object interactions both in the input space and the output space (decoded future output). The model is supplemented by a message passing interface over the graphs to fuse these different levels of interactions efficiently. Furthermore, to incorporate a real-world challenge, we propound to learn an indicator representing whether an estimated body joint is visible/invisible at each frame, e.g. due to occlusion or being outside the sensor field of view. Finally, we introduce a new benchmark for this joint task based on two challenging datasets (PoseTrack and 3DPW) and propose evaluation metrics to measure the effectiveness of predictions in the global space, even when there are invisible cases of joints. Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
翻译:人类轨迹和成形动态的联合预测是各种应用的基本组成部分,从机器人和自主驱动到监测系统。预测身体动态需要捕捉嵌入人类相互作用中和与现场物体相互作用中的微妙信息。在本文件中,我们提议基于图形关注网络的新型Trapotory和Pose Dynamics(Nickname TRiPOD)方法,以模拟输入空间和产出空间(代号未来输出)中的人与人和人之间的相互作用。模型由通过图表传递的信息界面来补充,以有效整合这些不同层次的互动。此外,为了纳入现实世界的挑战,我们准备学习一个指标,表明每个框架的估计身体联合是否可见/看不见,例如由于隐蔽或处于传感器外。最后,我们根据两个挑战性数据集(PoseTrack和3DPW)为这一联合任务提出了新的基准。我们提出了评估指标,以衡量全球空间预测的有效性,即使存在各种无形的联合轨迹预测案例,也具体展示了我们为每个轨道预测工作设计的所有状态。