Human motion prediction is a challenging task due to the stochasticity and aperiodicity of future poses. Recently, graph convolutional network has been proven to be very effective to learn dynamic relations among pose joints, which is helpful for pose prediction. On the other hand, one can abstract a human pose recursively to obtain a set of poses at multiple scales. With the increase of the abstraction level, the motion of the pose becomes more stable, which benefits pose prediction too. In this paper, we propose a novel Multi-Scale Residual Graph Convolution Network (MSR-GCN) for human pose prediction task in the manner of end-to-end. The GCNs are used to extract features from fine to coarse scale and then from coarse to fine scale. The extracted features at each scale are then combined and decoded to obtain the residuals between the input and target poses. Intermediate supervisions are imposed on all the predicted poses, which enforces the network to learn more representative features. Our proposed approach is evaluated on two standard benchmark datasets, i.e., the Human3.6M dataset and the CMU Mocap dataset. Experimental results demonstrate that our method outperforms the state-of-the-art approaches. Code and pre-trained models are available at https://github.com/Droliven/MSRGCN.
翻译:人类运动预测是一项具有挑战性的任务,因为人类运动预测是未来形态的随机性和周期性。 最近,图变网络已证明非常有效,可以学习成形组合之间的动态关系,有助于作出预测。另一方面,可以将人类的形态反复抽象,以便在多个尺度上获得一系列的形态。随着抽象层次的提高,人体形态的运动变得更加稳定,这也会带来预测。在本文件中,我们提议以端到端的方式,为人类的形态预测任务建立一个新的多层残余图变网络(MSR-GCN ) 。GCN 用于从精细到粗,然后从粗到细的尺度上提取特征。每个尺度的提取特征可以合并和解码,以获得投入和目标形态之间的残余。对所有预测的形态进行中间监督,以强化网络的具有代表性的特点。我们拟议的方法是在两种标准基准数据集上进行评估,即人造基-3.6M数据设置和CMU-D-C-CAPERMM/MCAF 之前的数据设置方法。实验结果显示我们现有的状态方法。