Predicting human motion from historical pose sequence is crucial for a machine to succeed in intelligent interactions with humans. One aspect that has been obviated so far, is the fact that how we represent the skeletal pose has a critical impact on the prediction results. Yet there is no effort that investigates across different pose representation schemes. We conduct an indepth study on various pose representations with a focus on their effects on the motion prediction task. Moreover, recent approaches build upon off-the-shelf RNN units for motion prediction. These approaches process input pose sequence sequentially and inherently have difficulties in capturing long-term dependencies. In this paper, we propose a novel RNN architecture termed AHMR (Attentive Hierarchical Motion Recurrent network) for motion prediction which simultaneously models local motion contexts and a global context. We further explore a geodesic loss and a forward kinematics loss for the motion prediction task, which have more geometric significance than the widely employed L2 loss. Interestingly, we applied our method to a range of articulate objects including human, fish, and mouse. Empirical results show that our approach outperforms the state-of-the-art methods in short-term prediction and achieves much enhanced long-term prediction proficiency, such as retaining natural human-like motions over 50 seconds predictions. Our codes are released.
翻译:从历史形态序列中预测人类运动的历史形态序列对于机器成功与人类进行智能互动至关重要。迄今为止,一个已被排除的方面是,我们如何代表骨骼结构对预测结果具有关键影响。然而,并没有努力调查各种构成形态图案。我们深入研究了各种表达方式,重点是其对运动预测任务的影响。此外,最近的方法以现成的RNN单元为基础,进行运动预测。这些方法的输入过程依次排列顺序,必然在捕捉长期依赖性方面有困难。在本文中,我们提议了一个名为AHNNN的新颖结构,称为AHMR(惯性高度分层运动经常网络),用于同时模拟当地运动背景和全球背景的运动预测。我们进一步探索了运动预测任务方面的地貌损失和前向动力损失,这比广泛使用的L2损失更具几何意义。有趣的是,我们用我们的方法对一系列清晰的物体,包括人类、鱼类和鼠标。Empricalalal结果显示,我们的方法超越了我们长期预测的模型,在短期里程中实现了我们的预测。