Dancing to music is one of human's innate abilities since ancient times. In machine learning research, however, synthesizing dance movements from music is a challenging problem. Recently, researchers synthesize human motion sequences through autoregressive models like recurrent neural network (RNN). Such an approach often generates short sequences due to an accumulation of prediction errors that are fed back into the neural network. This problem becomes even more severe in the long motion sequence generation. Besides, the consistency between dance and music in terms of style, rhythm and beat is yet to be taken into account during modeling. In this paper, we formalize the music-driven dance generation as a sequence-to-sequence learning problem and devise a novel seq2seq architecture to efficiently process long sequences of music features and capture the fine-grained correspondence between music and dance. Furthermore, we propose a novel curriculum learning strategy to alleviate error accumulation of autoregressive models in long motion sequence generation, which gently changes the training process from a fully guided teacher-forcing scheme using the previous ground-truth movements, towards a less guided autoregressive scheme mostly using the generated movements instead. Extensive experiments show that our approach significantly outperforms the existing state-of-the-arts on automatic metrics and human evaluation. We also make a demo video in the supplementary material to demonstrate the superior performance of our proposed approach.
翻译:与音乐相爱是人类自古以来固有的能力之一。然而,在机器学习研究中,将音乐的舞蹈运动与音乐结合起来是一个具有挑战性的问题。最近,研究人员通过经常性神经网络(RNN)等自动递减模型合成了人类运动序列。这种方法经常产生短序,因为预测错误的积累会反馈到神经网络中。在长运动序列生成过程中,这个问题变得更加严重。此外,舞蹈与音乐在风格、节奏和节奏方面的一致性在建模过程中还有待考虑。在本文中,我们正式将音乐驱动舞蹈生成作为一种从顺序到顺序学习的问题,并设计出一个新的后继2等结构,以便高效地处理音乐特征的长序,捕捉音乐与舞蹈之间的细微的对应。此外,我们提出了一个新的课程学习战略,以减缓长期运动序列生成的自动递增模型的错误积累。我们提出的培训过程在利用以往的地面运动时,从全面指导教师力力力力力力力制计划上有所改变。我们提出的培训过程将转向较不那么的自导的自导的自导的自导进进进进进进进制进制进制进制式进制式进制式进制式进制式的进制式的进制式进制式进制式的进制式的进制式进制式进制式的进制式进制式进制式的进制式的进制式进制式进制式式进制式式式式的进制式的进制。我们制制式制式制式制式制式制式制式制式制式制式制式制式制式制式制式制制制制式制式制式制式制式制式制式制式制制制制。我们制制制制制制制制制制制制制制制制制制制制制制制制制制式制式制式制式制式制式制式制式制式制式制式制式制式制制制制制制制制制制制制制制制制制制制。制制制制制式制式制式制式制制制制制制制制制制制制制制制制制制制制制式制制制制制制制制制制制制制制制制制制