Dancing to music is one of human's innate abilities since ancient times. In machine learning research, however, synthesizing dance movements from music is a challenging problem. Recently, researchers synthesize human motion sequences through autoregressive models like recurrent neural network (RNN). Such an approach often generates short sequences due to an accumulation of prediction errors that are fed back into the neural network. This problem becomes even more severe in the long motion sequence generation. Besides, the consistency between dance and music in terms of style, rhythm and beat is yet to be taken into account during modeling. In this paper, we formalize the music-driven dance generation as a sequence-to-sequence learning problem and devise a novel seq2seq architecture to efficiently process long sequences of music features and capture the fine-grained correspondence between music and dance. Furthermore, we propose a curriculum learning strategy to alleviate error accumulation of autoregressive models in long motion sequence generation, which gently changes the training process from a fully guided teacher-forcing scheme using the previous ground-truth movements, towards a less guided autoregressive scheme mostly using the generated movements instead. Extensive experiments demonstrate that our approach significantly outperforms the existing methods on automatic metrics and human evaluation. The code and data are now available at https://github.com/stonyhu/DanceRevolution.
翻译:音乐是人类自古以来固有的能力之一。然而,在机器学习研究中,将音乐的舞蹈运动与音乐结合起来是一个具有挑战性的问题。最近,研究人员通过经常性神经网络(RNN)等自动递减模型合成了人类运动序列。这种方法往往产生短序,因为预测错误的积累会反馈到神经网络中。在长运动序列生成过程中,这一问题变得更加严重。此外,舞蹈和音乐在风格、节奏和节奏方面的一致性在建模过程中还有待考虑。在本文中,我们正式将音乐驱动的舞蹈生成作为一种从顺序到顺序学习的问题,并设计一个新型的后奏2eq结构,以高效地处理音乐特征的长序并捕捉音乐和舞蹈之间的细微的对应。此外,我们提出了一个课程学习战略,以长运动序列生成的方式减轻自动递增模型的错误积累。我们用以前的地序移动方法将培训过程从全面改变为师力力的师力力力力力力力力制计划。我们现有的自动递增式的模型实验大多使用生成的方法。