Multi-person motion prediction remains a challenging problem, especially in the joint representation learning of individual motion and social interactions. Most prior methods only involve learning local pose dynamics for individual motion (without global body trajectory) and also struggle to capture complex interaction dependencies for social interactions. In this paper, we propose a novel Social-Aware Motion Transformer (SoMoFormer) to effectively model individual motion and social interactions in a joint manner. Specifically, SoMoFormer extracts motion features from sub-sequences in displacement trajectory space to effectively learn both local and global pose dynamics for each individual. In addition, we devise a novel social-aware motion attention mechanism in SoMoFormer to further optimize dynamics representations and capture interaction dependencies simultaneously via motion similarity calculation across time and social dimensions. On both short- and long-term horizons, we empirically evaluate our framework on multi-person motion datasets and demonstrate that our method greatly outperforms state-of-the-art methods of single- and multi-person motion prediction. Code will be made publicly available upon acceptance.
翻译:多人运动预测仍是一个具有挑战性的问题,特别是在个人运动和社会互动的共同代表学习方面。大多数先前的方法都只涉及学习当地对个人运动(没有全球身体轨迹)构成的动态,并努力捕捉复杂的互动依赖性,以进行社会互动。在本文件中,我们提出一个新的社会软件变异器(SoMoformer),以共同的方式有效地模拟个人运动和社会互动。具体地说,SoMoformer从流离失所轨道空间的次序列中提取运动特征,以有效学习对每个人的当地和全球构成动态。此外,我们在SoMoformer设计了一个新的社会觉悟运动关注机制,通过时间和社会层面的类似性计算运动,进一步优化动态表达和捕捉互动依赖性。从短期和长期的角度,我们实证地评估我们关于多人运动数据集的框架,并表明我们的方法大大超越了单人和多人运动预测的最新方法。一旦被接受,将公布守则。