This paper tackles the problem of human motion prediction, consisting in forecasting future body poses from historically observed sequences. Despite of their performance, current state-of-the-art approaches rely on deep learning architectures of arbitrary complexity, such as Recurrent Neural Networks~(RNN), Transformers or Graph Convolutional Networks~(GCN), typically requiring multiple training stages and more than 3 million of parameters. In this paper we show that the performance of these approaches can be surpassed by a light-weight and purely MLP architecture with only 0.14M parameters when appropriately combined with several standard practices such as representing the body pose with Discrete Cosine Transform (DCT), predicting residual displacement of joints and optimizing velocity as an auxiliary loss. An exhaustive evaluation on Human3.6M, AMASS and 3DPW datasets shows that our method, which we dub siMLPe, consistently outperforms all other approaches. We hope that our simple method could serve a strong baseline to the community and allow re-thinking the problem of human motion prediction and whether current benchmarks do really need intricate architectural designs. Our code is available at \url{https://github.com/dulucas/siMLPe}.
翻译:本文处理人类运动预测问题,包括预测历史观察序列中未来身体构成的人类运动预测问题。尽管其表现不同,目前最先进的方法依赖任意复杂的深层次学习结构,例如经常性神经网络~(RNN)、变异器或图形革命网络~(GCN),通常需要多个培训阶段和超过300万个参数。在本文中,我们表明,这些方法的性能可以通过一个轻量和纯MLP结构而超过,只有0.14M参数,如果适当结合若干标准做法,例如代表身体与混凝土科辛变异(DCT)形成的关系,预测联合的剩余迁移和优化速度作为辅助损失。对人文3.6M、AMASS和3DPW数据集的详尽评估表明,我们称之为SimMLPe的方法始终超越所有其他方法。我们希望我们简单的方法能够为社区提供一个强大的基线,并允许重新思考人类运动预测问题和当前基准是否真的需要复杂的建筑设计。我们的代码可以在\\ mL/GURI{http://gissubs.