Natural and expressive human motion generation is the holy grail of computer animation. It is a challenging task, due to the diversity of possible motion, human perceptual sensitivity to it, and the difficulty of accurately describing it. Therefore, current generative solutions are either low-quality or limited in expressiveness. Diffusion models, which have already shown remarkable generative capabilities in other domains, are promising candidates for human motion due to their many-to-many nature, but they tend to be resource hungry and hard to control. In this paper, we introduce Motion Diffusion Model (MDM), a carefully adapted classifier-free diffusion-based generative model for the human motion domain. MDM is transformer-based, combining insights from motion generation literature. A notable design-choice is the prediction of the sample, rather than the noise, in each diffusion step. This facilitates the use of established geometric losses on the locations and velocities of the motion, such as the foot contact loss. As we demonstrate, MDM is a generic approach, enabling different modes of conditioning, and different generation tasks. We show that our model is trained with lightweight resources and yet achieves state-of-the-art results on leading benchmarks for text-to-motion and action-to-motion. https://guytevet.github.io/mdm-page/ .
翻译:自然和表达人类运动的产生是计算机动画的圣杯。 这是一项具有挑战性的任务, 原因是可能的运动的多样性、 人类感知敏感度和准确描述它的困难。 因此, 目前的基因化解决方案要么质量低, 或表现有限。 传播模型在其他领域已经表现出非凡的基因化能力, 由于其多到多的性质, 对人类运动很有希望, 但它们往往是资源饥饿和难以控制的。 在本文中, 我们引入了驱动模型(MDM ), 这是一种经过仔细调整的分类师免费传播模型, 用于人类运动领域。 MDM 是以变异器为基础的, 结合了运动生成文献的洞见。 显著的设计选择是样本的预测, 而不是每个传播步骤的噪音。 这有利于使用运动地点和速度的既定地貌损失, 如脚接触损失。 正如我们所证明的那样, MDM是一种通用方法, 允许不同的调控模式, 以及不同的一代任务。 我们展示我们的模型是以变异模式为基础, 以轻量/ 的文本为基准, 实现 MAGM 。