3D skeleton-based motion prediction and activity recognition are two interwoven tasks in human behaviour analysis. In this work, we propose a motion context modeling methodology that provides a new way to combine the advantages of both graph convolutional neural networks and recurrent neural networks for joint human motion prediction and activity recognition. Our approach is based on using an LSTM encoder-decoder and a non-local feature extraction attention mechanism to model the spatial correlation of human skeleton data and temporal correlation among motion frames. The proposed network can easily include two output branches, one for Activity Recognition and one for Future Motion Prediction, which can be jointly trained for enhanced performance. Experimental results on Human 3.6M, CMU Mocap and NTU RGB-D datasets show that our proposed approach provides the best prediction capability among baseline LSTM-based methods, while achieving comparable performance to other state-of-the-art methods.
翻译:3D基于骨骼的运动预测和活动识别是人类行为分析的两个相互交织的任务。在这项工作中,我们提出了一种运动背景建模方法,该方法提供了一种新的方法,将图形进化神经网络和经常性神经网络的优势结合起来,共同进行人类动作预测和活动识别。我们的方法基于使用LSTM 编码器解码器和非本地特征提取关注机制,以模拟人类骨骼数据的空间相关性和运动框架之间的时间相关性。拟议的网络可以很容易地包括两个产出分支,一个是活动识别分支,另一个是未来运动预测分支,可以联合培训以提高性能。人类3.6M、CMU Mocap和NTU RGB-D数据集的实验结果表明,我们拟议的方法提供了基于基线LSTM方法的最佳预测能力,同时实现与其他最新方法的类似性能。