Human motion prediction aims to forecast future poses given a sequence of past 3D skeletons. While this problem has recently received increasing attention, it has mostly been tackled for single humans in isolation. In this paper, we explore this problem when dealing with humans performing collaborative tasks, we seek to predict the future motion of two interacted persons given two sequences of their past skeletons. We propose a novel cross interaction attention mechanism that exploits historical information of both persons, and learns to predict cross dependencies between the two pose sequences. Since no dataset to train such interactive situations is available, we collected ExPI (Extreme Pose Interaction), a new lab-based person interaction dataset of professional dancers performing Lindy-hop dancing actions, which contains 115 sequences with 30K frames annotated with 3D body poses and shapes. We thoroughly evaluate our cross interaction network on ExPI and show that both in short- and long-term predictions, it consistently outperforms state-of-the-art methods for single-person motion prediction.
翻译:人类运动预测旨在预测未来构成一个过去三维骨骼序列的序列。 虽然这个问题最近日益受到关注, 但大多是为孤立的单个人解决的。 在本文中,我们在处理从事协作任务的人类时探讨这一问题,我们试图预测两个互动者的未来运动,其过去骨骼的顺序为两序列; 我们提议了一个新的交叉互动关注机制,利用双方的历史信息,并学会预测两个构成序列之间的相互依存性。 由于没有数据集来训练这种互动情况,我们收集了ExPI(Extreme Pose Extrolation),这是一套基于实验室的新个人互动数据集,由从事Lindy-hop舞蹈动作的专业舞者组成,其中包括115个序列,带有30K框架,加上3D身材和形状。我们彻底评价了我们关于ExPI的交叉互动网络,并显示在短期和长期预测中,它始终优于单人动作预测的最新方法。