Human motion prediction aims to forecast future human poses given a sequence of past 3D skeletons. While this problem has recently received increasing attention, it has mostly been tackled for single humans in isolation. In this paper we explore this problem from a novel perspective, involving humans performing collaborative tasks. We assume that the input of our system are two sequences of past skeletons for two interacting persons, and we aim to predict the future motion for each of them. For this purpose, we devise a novel cross interaction attention mechanism that exploits historical information of both persons and learns to predict cross dependencies between self poses and the poses of the other person in spite of their spatial or temporal distance. Since no dataset to train such interactive situations is available, we have captured ExPI (Extreme Pose Interaction), a new lab-based person interaction dataset of professional dancers performing acrobatics. ExPI contains 115 sequences with 30k frames and 60k instances with annotated 3D body poses and shapes. We thoroughly evaluate our cross-interaction network on this dataset and show that both in short-term and long-term predictions, it consistently outperforms baselines that independently reason for each person. We plan to release our code jointly with the dataset and the train/test splits to spur future research on the topic.
翻译:人类运动预测旨在预测未来人类的构成,这是过去3D骨骼的序列。虽然这个问题最近受到越来越多的关注,但大部分是针对孤立的单个人的。在这份文件中,我们从一个新角度探讨这一问题,涉及执行协作任务的人类。我们假设,我们系统的输入是两个互动者的过去骨骼的两序列,我们的目标是预测其中每个人的今后运动。为此目的,我们设计了一个新的交叉互动关注机制,利用双方的历史信息,并学会预测在空间或时间距离上自我构成与他人的构成之间的相互依赖性。由于没有用于培训这种互动情况的数据集,我们从新的角度来探讨这一问题,涉及执行协作任务的人类。我们假设我们系统的输入是两个互动人的过去骨骼的两序列,而我们的目标是预测其中每一个人的未来运动。Expreme Pose 包含115个序列,有30k框架,60k 和60k 实例,配有3D体的配置和形状。我们彻底评估了我们关于这一数据集的交叉互动网络,并显示在短期和长期的预测中显示,我们没有用来训练这种互动情况的数据集,因此,我们连续地将每个研究的基线与我们独立地推导出我们未来的研究。