In this paper, we develop a neural network model to predict future human motion from an observed human motion history. We propose a non-autoregressive transformer architecture to leverage its parallel nature for easier training and fast, accurate predictions at test time. The proposed architecture divides human motion prediction into two parts: 1) the human trajectory, which is the hip joint 3D position over time and 2) the human pose which is the all other joints 3D positions over time with respect to a fixed hip joint. We propose to make the two predictions simultaneously, as the shared representation can improve the model performance. Therefore, the model consists of two sets of encoders and decoders. First, a multi-head attention module applied to encoder outputs improves human trajectory. Second, another multi-head self-attention module applied to encoder outputs concatenated with decoder outputs facilitates learning of temporal dependencies. Our model is well-suited for robotic applications in terms of test accuracy and speed, and compares favorably with respect to state-of-the-art methods. We demonstrate the real-world applicability of our work via the Robot Follow-Ahead task, a challenging yet practical case study for our proposed model.
翻译:在本文中,我们开发了一个神经网络模型,从观察到的人类运动史中预测未来人类运动。我们提议了一个非侵略性变压器结构,以利用其平行性质,进行更简单的培训和测试时的快速准确预测。拟议结构将人类运动预测分为两部分:1)人类轨迹,即长时期3D状态的臀部组合,2)人形,即相对于固定的臀部连接的所有其他3D位置。我们提议同时作出两种预测,因为共享的表示可以改进模型的性能。因此,模型由两组编码器和解析器组成。首先,一个用于编码器输出的多头关注模块可以改善人类的轨迹。第二,另一个多头自我注意模块用于与解码器输出相融合的编码器输出,有助于了解时间依赖性。我们的模型在测试准确性和速度方面非常适合机器人应用,并且比得上先进的方法。我们展示了我们工作的真实世界模式,通过一个具有挑战性、具有挑战性的研究案例。