The emergence of neural networks has revolutionized the field of motion synthesis. Yet, learning to unconditionally synthesize motions from a given distribution remains a challenging task, especially when the motions are highly diverse. We present MoDi, an unconditional generative model that synthesizes diverse motions. Our model is trained in a completely unsupervised setting from a diverse, unstructured and unlabeled motion dataset and yields a well-behaved, highly semantic latent space. The design of our model follows the prolific architecture of StyleGAN and adapts two of its key technical components into the motion domain: a set of style-codes injected into each level of the generator hierarchy and a mapping function that learns and forms a disentangled latent space. We show that despite the lack of any structure in the dataset, the latent space can be semantically clustered, and facilitates semantic editing and motion interpolation. In addition, we propose a technique to invert unseen motions into the latent space, and demonstrate latent-based motion editing operations that otherwise cannot be achieved by naive manipulation of explicit motion representations. Our qualitative and quantitative experiments show that our framework achieves state-of-the-art synthesis quality that can follow the distribution of highly diverse motion datasets. Code and trained models will be released at https://sigal-raab.github.io/MoDi.
翻译:神经网络的出现使运动合成领域发生了革命性的变化。然而,学习无条件综合某一分配的动作仍然是一项具有挑战性的任务,特别是在动议非常多样的情况下。我们展示了莫迪,这是一个无条件的基因模型,它综合了各种动作。我们的模型在完全无人监督的环境中从一个多样化的、没有结构的和没有标签的运动数据集中训练,产生一个精致的、高度语义化的潜伏空间。我们的模型设计遵循SteleGAN的宏大结构,并将其两个关键技术组成部分调整到运动领域:一套注入每个层次的生成器结构的样式代码,以及一个能够学习和形成混乱的潜在空间的绘图功能。我们表明,尽管在数据集中缺乏任何结构,但潜在的空间可以进行静态的组合,并形成一个精密的、高度语义化的编辑和运动的隐性潜性潜性潜性潜性潜性潜性潜性运动。此外,我们提出了一种将隐性动作转换到暗性空间的技术,并展示出一种无法通过对明确动作图示的天化操纵来实现的潜性动作操作。我们的定性和定量和定量实验显示,我们的框架的质量质量模型将可实现高层次的合成。我们经过培训的模型的版本的版本的模型将可实现状态。