We propose a novel deep generative model based on causal convolutions for multi-subject motion modeling and synthesis, which is inspired by the success of WaveNet in multi-subject speech synthesis. However, it is nontrivial to adapt WaveNet to handle high-dimensional and physically constrained motion data. To this end, we add an encoder and a decoder to the WaveNet to translate the motion data into features and back to the predicted motions. We also add 1D convolution layers to take skeleton configuration as an input to model skeleton variations across different subjects. As a result, our network can scale up well to large-scale motion data sets across multiple subjects and support various applications, such as random and controllable motion synthesis, motion denoising, and motion completion, in a unified way. Complex motions, such as punching, kicking and, kicking while punching, are also well handled. Moreover, our network can synthesize motions for novel skeletons not in the training dataset. After fine-tuning the network with a few motion data of the novel skeleton, it is able to capture the personalized style implied in the motion and generate high-quality motions for the skeleton. Thus, it has the potential to be used as a pre-trained network in few-shot learning for motion modeling and synthesis. Experimental results show that our model can effectively handle the variation of skeleton configurations, and it runs fast to synthesize different types of motions on-line. We also perform user studies to verify that the quality of motions generated by our network is superior to the motions of state-of-the-art human motion synthesis methods.
翻译:我们提出一个新的深层次基因模型,以因果混和为基础,用于多主题运动模型和合成,这是由WaveNet在多主题语音合成中的成功激励的。然而,对WaveNet进行随机和可控的运动合成、运动去乱动和运动完成等应用是毫无道理的。为此,我们在WaveNet中添加一个编码器和解码器,将运动数据转换成特性,并追溯到预测动作。我们还添加了1D变相层,将骨架配置作为模拟不同主题骨架变异的输入。因此,我们的网络可以很好地扩大至多个主题的大型运动数据集,支持各种应用,例如随机和可控的运动合成、运动去乱动和运动完成。我们网络的复杂动作,例如拳击、踢踢和拳击,也可以很好地处理。此外,我们的网络可以合成新骨架的动作。在对网络进行微动变现后,它可以捕捉到移动前所隐含的个人化的风格,并产生高品质的动作,因此,我们可以有效地进行实验性变动的网络的模型。