Human motion prediction is a classical problem in computer vision and computer graphics, which has a wide range of practical applications. Previous effects achieve great empirical performance based on an encoding-decoding fashion. The methods of this fashion work by first encoding previous motions to latent representations and then decoding the latent representations into predicted motions. However, in practice, they are still unsatisfactory due to several issues, including complicated loss constraints, cumbersome training processes, and scarce switch of different categories of motions in prediction. In this paper, to address the above issues, we jump out of the foregoing fashion and propose a novel framework from a new perspective. Specifically, our framework works in a denoising diffusion style. In the training stage, we learn a motion diffusion model that generates motions from random noise. In the inference stage, with a denoising procedure, we make motion prediction conditioning on observed motions to output more continuous and controllable predictions. The proposed framework enjoys promising algorithmic properties, which only needs one loss in optimization and is trained in an end-to-end manner. Additionally, it accomplishes the switch of different categories of motions effectively, which is significant in realistic tasks, \textit{e.g.}, the animation task. Comprehensive experiments on benchmarks confirm the superiority of the proposed framework. The project page is available at \url{https://lhchen.top/Human-MAC}.
翻译:人类运动预测是计算机视觉和计算机图形的一个典型问题,它具有广泛的实际应用。以前的效果是在编码解码方式的基础上取得伟大的实证性能。这种时尚工作的方法是首先将以前的动议编码为潜伏代表,然后将潜在代表形式编码为预测动议。然而,在实际中,由于若干问题,包括复杂的损失限制、繁琐的培训过程和不同类别预测动议的很少转换,这些预测仍然不尽如人意。在本文件中,为了解决上述问题,我们跳出上述时装,从一个新的角度提出一个新的框架。具体地说,我们的框架以解密的传播方式运作。在培训阶段,我们学习了一种运动扩散模式,这种模式产生随机噪音的动作。在推断阶段,我们用一种解析程序,对观察到的动作作出预测,以更连续和可控制的预测。拟议框架具有很有希望的算法特性,只需要一个优化方面的损失,并且以最终到终端的方式加以培训。此外,它有效地转换了不同类别的运动,在现实的任务中具有重大意义,\ Textrialitalimalital practal subilital subilation.