How to automatically synthesize natural-looking dance movements based on a piece of music is an incrementally popular yet challenging task. Most existing data-driven approaches require hard-to-get paired training data and fail to generate long sequences of motion due to error accumulation of autoregressive structure. We present a novel 3D dance synthesis system that only needs unpaired data for training and could generate realistic long-term motions at the same time. For the unpaired data training, we explore the disentanglement of beat and style, and propose a Transformer-based model free of reliance upon paired data. For the synthesis of long-term motions, we devise a new long-history attention strategy. It first queries the long-history embedding through an attention computation and then explicitly fuses this embedding into the generation pipeline via multimodal adaptation gate (MAG). Objective and subjective evaluations show that our results are comparable to strong baseline methods, despite not requiring paired training data, and are robust when inferring long-term music. To our best knowledge, we are the first to achieve unpaired data training - an ability that enables to alleviate data limitations effectively. Our code is released on https://github.com/BFeng14/RobustDancer
翻译:如何根据一首音乐自动合成自然的舞蹈动作是一项越来越受欢迎但具有挑战性的任务。大多数现有的数据驱动方法需要难以获取的配对训练数据,并且由于自回归结构的误差积累而无法生成长序列的动作。本文提出了一种新型的3D舞蹈合成系统,只需要非配对数据进行训练,同时可以生成逼真的长期运动。对于非配对数据的训练,我们探索了节奏和风格的解耦,并提出了一种基于Transformer模型的不依赖于配对数据的方法。对于长期运动的合成,我们设计了一种新的长历史注意力策略。它首先通过注意力计算查询长历史嵌入,然后通过多模适应门(MAG)将该嵌入明确地融合到生成管道中。客观和主观评估表明,我们的结果与强基线方法相当,尽管不需要配对训练数据,并且在推断长期音乐时表现稳健。据我们所知,我们是第一个实现非配对数据训练的研究者 - 这种能力可以有效缓解数据限制。我们的代码在https://github.com/BFeng14/RobustDancer 上公开释出。