Dance-driven music generation aims to generate musical pieces conditioned on dance videos. Previous works focus on monophonic or raw audio generation, while the multiinstruments scenario is under-explored. The challenges of the dance-driven multi-instruments music (MIDI) generation are two-fold: 1) no publicly available multi-instruments MIDI and video paired dataset and 2) the weak correlation between music and video. To tackle these challenges, we build the first multi-instruments MIDI and dance paired dataset (D2MIDI). Based on our proposed dataset, we introduce a multi-instruments MIDI generation framework (Dance2MIDI) conditioned on dance video. Specifically, 1) to model the correlation between music and dance, we encode the dance motion using the GCN, and 2) to generate harmonious and coherent music, we employ Transformer to decode the MIDI sequence. We evaluate the generated music of our framework trained on D2MIDI dataset and demonstrate that our method outperforms existing methods. The data and code are available on https://github.com/Dance2MIDI/Dance2MIDI
翻译:由舞蹈驱动的音乐制作旨在产生以舞蹈录像为条件的音乐片段。以前的工作重点是单声或原始声频生成,而多种工具的情景则未得到充分探讨。舞蹈驱动的多工具音乐(MIDI)生成的挑战有两个方面:(1) 没有公开的多种工具MIDI和视频配对数据集,(2) 音乐和视频之间的相关性薄弱。为了应对这些挑战,我们建立了第一个多工具MIDI和舞蹈配对数据集(D2MIDI)。根据我们提议的数据集,我们引入了一个多工具MIDI生成框架(Dance2MIDI),以舞蹈视频为条件。具体来说,1)为模拟音乐与舞蹈之间的相互关系,我们用GCN对舞蹈运动进行编码,2)为产生和谐和连贯的音乐,我们使用变换器解码MIDI序列。我们评估了我们D2MIDI数据集培训的框架生成的音乐,并证明我们的方法超越了现有方法。数据和代码可以在 https://github.com/DINGIS2/DIMIMance上查阅。