In this work, we present MotionMixer, an efficient 3D human body pose forecasting model based solely on multi-layer perceptrons (MLPs). MotionMixer learns the spatial-temporal 3D body pose dependencies by sequentially mixing both modalities. Given a stacked sequence of 3D body poses, a spatial-MLP extracts fine grained spatial dependencies of the body joints. The interaction of the body joints over time is then modelled by a temporal MLP. The spatial-temporal mixed features are finally aggregated and decoded to obtain the future motion. To calibrate the influence of each time step in the pose sequence, we make use of squeeze-and-excitation (SE) blocks. We evaluate our approach on Human3.6M, AMASS, and 3DPW datasets using the standard evaluation protocols. For all evaluations, we demonstrate state-of-the-art performance, while having a model with a smaller number of parameters. Our code is available at: https://github.com/MotionMLP/MotionMixer
翻译:在这项工作中,我们介绍一个高效的 3D 人体运动Mixer, 这个3D 人体构成预测模型, 仅以多层感官( MLPs) 为基础。 运动Mixer 通过顺序混合两种模式, 空间时空 3D 体体具有依赖性。 3D 体构成的堆叠序列中, 空间- MLP 提取了身体连接的细细细的空间依赖性。 然后, 由时间性 MLP 来模拟身体连接的相互作用。 空间时空混合功能最终被汇总并解码, 以获得未来运动。 为了校准每个时间步骤的影响, 我们使用挤压和Expication( SE) 区块。 我们使用标准评价协议来评估我们关于 Human3. 6M、 AMASS 和 3DPW 数据集的方法。 对于所有评估, 我们演示“ 状态- 艺术” 性能,, 同时有一个参数较少的模型。 我们的代码可以在 https://github.com/ MotionMLP/MotionMixer查阅 。