The task of collaborative human pose forecasting stands for predicting the future poses of multiple interacting people, given those in previous frames. Predicting two people in interaction, instead of each separately, promises better performance, due to their body-body motion correlations. But the task has remained so far primarily unexplored. In this paper, we review the progress in human pose forecasting and provide an in-depth assessment of the single-person practices that perform best for 2-body collaborative motion forecasting. Our study confirms the positive impact of frequency input representations, space-time separable and fully-learnable interaction adjacencies for the encoding GCN and FC decoding. Other single-person practices do not transfer to 2-body, so the proposed best ones do not include hierarchical body modeling or attention-based interaction encoding. We further contribute a novel initialization procedure for the 2-body spatial interaction parameters of the encoder, which benefits performance and stability. Altogether, our proposed 2-body pose forecasting best practices yield a performance improvement of 21.9% over the state-of-the-art on the most recent ExPI dataset, whereby the novel initialization accounts for 3.5%. See our project page at https://www.pinlab.org/bestpractices2body
翻译:人体协同姿态预测任务旨在预测多个相互交互的人的未来姿势,给定先前帧中的姿势。相较于单独预测每个人,预测两个人的姿态能得到更好的性能,因为它们的身体-身体运动之间存在相关性。但是这个任务目前主要还未得到探索。在本文中,我们回顾了人体姿态预测的进展,并深入评估了单人实践在两人协作运动预测中表现最佳的方法。我们的研究证实了频率输入表示法、时空可分离和全可学习的相互作用邻接矩阵对GCN编码和FC解码的影响。其他单人实践不能转移到2人,所以我们提出的最佳实践不包括层次化身体建模或基于注意力的交互编码。我们进一步贡献了一种新的初始化程序,用于编码器的2体空间交互参数,这有助于提高性能和稳定性。综上所述,我们提出的2体姿态预测最佳实践在最新的ExPI数据集上相较于现有技术提高了21.9%的性能,其中新的初始化占3.5%。请参见我们的项目网页https://www.pinlab.org/bestpractices2body。