This paper tackles the problem of human motion prediction, consisting in forecasting future body poses from historically observed sequences. State-of-the-art approaches provide good results, however, they rely on deep learning architectures of arbitrary complexity, such as Recurrent Neural Networks(RNN), Transformers or Graph Convolutional Networks(GCN), typically requiring multiple training stages and more than 2 million parameters. In this paper, we show that, after combining with a series of standard practices, such as applying Discrete Cosine Transform(DCT), predicting residual displacement of joints and optimizing velocity as an auxiliary loss, a light-weight network based on multi-layer perceptrons(MLPs) with only 0.14 million parameters can surpass the state-of-the-art performance. An exhaustive evaluation on the Human3.6M, AMASS, and 3DPW datasets shows that our method, named siMLPe, consistently outperforms all other approaches. We hope that our simple method could serve as a strong baseline for the community and allow re-thinking of the human motion prediction problem. The code is publicly available at \url{https://github.com/dulucas/siMLPe}.
翻译:本文处理人类运动预测问题,包括预测历史观察到的序列中未来身体构成的人类运动预测问题。最先进的方法提供了良好的结果。然而,它们依赖任意复杂的深层次学习结构,如经常性神经网络(NNN)、变异器或图形革命网络(GCN),通常需要多个培训阶段和超过200万个参数。在本文中,我们表明,在与一系列标准做法相结合之后,如应用Discrete Cosine变换(DCT),预测接合的剩余迁移和优化速度作为辅助损失,基于多层过敏器(MLPs)的轻量网络,只有0.14万个参数,能够超过最新业绩。对人文3.6M、AMAS和3DPW数据集的详尽评估表明,我们称为simMLPe的方法始终超越所有其他方法。我们希望我们简单的方法能够作为社区的强大基准,并允许重新思考人类运动预测问题。代码在MLL/Peurusima{http://masqurs/masurs/masqus/masqurd。