通过在低空空间的移动扩散执行命令 (Executing your Commands via Motion Diffusion in Latent Space)

We study a challenging task, conditional human motion generation, which produces plausible human motion sequences according to various conditional inputs, such as action classes or textual descriptors. Since human motions are highly diverse and have a property of quite different distribution from conditional modalities, such as textual descriptors in natural languages, it is hard to learn a probabilistic mapping from the desired conditional modality to the human motion sequences. Besides, the raw motion data from the motion capture system might be redundant in sequences and contain noises; directly modeling the joint distribution over the raw motion sequences and conditional modalities would need a heavy computational overhead and might result in artifacts introduced by the captured noises. To learn a better representation of the various human motion sequences, we first design a powerful Variational AutoEncoder (VAE) and arrive at a representative and low-dimensional latent code for a human motion sequence. Then, instead of using a diffusion model to establish the connections between the raw motion sequences and the conditional inputs, we perform a diffusion process on the motion latent space. Our proposed Motion Latent-based Diffusion model (MLD) could produce vivid motion sequences conforming to the given conditional inputs and substantially reduce the computational overhead in both the training and inference stages. Extensive experiments on various human motion generation tasks demonstrate that our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks, with two orders of magnitude faster than previous diffusion models on raw motion sequences.

翻译：我们研究一项具有挑战性的任务,即有条件的人类运动生成,它根据各种有条件的投入,例如行动类或文字描述符等,产生可信的人类运动序列。由于人类运动是高度多样化的,并且与自然语言的文字描述符等有条件模式有相当不同的分布特性,我们很难从理想的有条件模式到人类运动序列来进行概率性绘图。此外,运动捕捉系统的原始运动数据在序列中可能是多余的,并含有噪音;直接模拟原始运动序列和有条件模式之间的联合分配,将需要一个沉重的计算间接费用,并可能导致被捕获噪音引入的工艺品。为了更好地了解各种人类运动序列(例如自然语言的文本描述符)的更好表述,我们首先设计一个强大的自动电动编码(VAE),并得出一个具有代表性和低维度的人类运动序列。随后,我们没有使用扩散模型来建立原始运动序列与有条件投入之间的联系,而是在运动潜伏空间上进行一个扩散过程。我们提议的移动前的流传模型(MLD)模型(MLD)模型(MLD)中的原始扩散模型(MLA)模型将使得人类运动的原始运动的原始动作升级升级化过程升级化过程大大地进行。

相关内容