The emergence of neural networks has revolutionized the field of motion synthesis. Yet, learning to unconditionally synthesize motions from a given distribution remains a challenging task, especially when the motions are highly diverse. In this work, we present MoDi - a generative model trained in a completely unsupervised setting from an extremely diverse, unstructured and unlabeled motion dataset. During inference, MoDi can synthesize high-quality, diverse motions that lay in a well-behaved and highly semantic latent space. We show that despite the lack of any structure in the dataset, the latent space can be semantically clustered, facilitating various applications including, semantic editing, crowd simulation and motion interpolation. Our qualitative and quantitative experiments show that our framework achieves state-of-the-art synthesis quality that can follow the distribution of highly diverse motion datasets. Code and trained models are available at https://sigal-raab.github.io/MoDi.
翻译:神经网络的出现使运动合成领域发生了革命性的变化。然而,学习无条件综合某一分配的动作仍然是一项具有挑战性的任务,特别是当动作高度多样化时。我们在此工作中介绍莫迪(Modi)——一个在完全无人监督的环境中训练的基因模型,它来自一个极其多样、没有结构和没有标签的运动数据集。在推论过程中,莫迪可以综合出存在于一个精密和高度语义隐蔽空间的高质量、多样的动作。我们表明,尽管数据集中没有任何结构,但潜伏空间可以进行语义组合,便利各种应用,包括语义编辑、人群模拟和运动内插。我们的定性和定量实验表明,我们的框架能够取得最先进的合成质量,从而跟踪高度多样化的运动数据集的分布。可在https://sigal-raab.github.io/Modi上查阅守则和经过训练的模型。