In this work, we propose a novel deep learning framework that can generate a vivid dance from a whole piece of music. In contrast to previous works that define the problem as generation of frames of motion state parameters, we formulate the task as a prediction of motion curves between key poses, which is inspired by the animation industry practice. The proposed framework, named DanceNet3D, first generates key poses on beats of the given music and then predicts the in-between motion curves. DanceNet3D adopts the encoder-decoder architecture and the adversarial schemes for training. The decoders in DanceNet3D are constructed on MoTrans, a transformer tailored for motion generation. In MoTrans we introduce the kinematic correlation by the Kinematic Chain Networks, and we also propose the Learned Local Attention module to take the temporal local correlation of human motion into consideration. Furthermore, we propose PhantomDance, the first large-scale dance dataset produced by professional animatiors, with accurate synchronization with music. Extensive experiments demonstrate that the proposed approach can generate fluent, elegant, performative and beat-synchronized 3D dances, which significantly surpasses previous works quantitatively and qualitatively.
翻译:在这项工作中,我们提出一个新的深层次学习框架,能够从整个音乐中产生生动的舞蹈。与以前将问题定义为运动状态参数框架的生成的作品相比,我们把任务设计成由动画产业实践启发的钥匙形状之间的运动曲线预测。拟议的框架名为DanceNet3D,首先在给定音乐的节拍上产生关键成份,然后预测运动曲线之间的变化。DanceNet3D 采用了编码-解码架构和对抗性培训计划。DanceNet3D 中的解调器建在为运动生成而设计的变异器MoTrans上。在MoTrans中,我们引入了Kinematic连锁网络的动态相关性,我们还提出了本地注意模块,以考虑到人类运动的时间与当地的相关性。此外,我们提议了PhantomDance,这是由专业刺激者制作的第一个大型舞蹈数据集,与音乐的精确同步。广泛的实验表明,拟议的方法可以产生流利、优、性、性、制和制动的3D-同步的舞蹈。