Data is the lifeblood of modern machine learning systems, including for those in Music Information Retrieval (MIR). However, MIR has long been mired by small datasets and unreliable labels. In this work, we propose to break this bottleneck using generative modeling. By pipelining a generative model of notes (Coconet trained on Bach Chorales) with a structured synthesis model of chamber ensembles (MIDI-DDSP trained on URMP), we demonstrate a system capable of producing unlimited amounts of realistic chorale music with rich annotations including mixes, stems, MIDI, note-level performance attributes (staccato, vibrato, etc.), and even fine-grained synthesis parameters (pitch, amplitude, etc.). We call this system the Chamber Ensemble Generator (CEG), and use it to generate a large dataset of chorales from four different chamber ensembles (CocoChorales). We demonstrate that data generated using our approach improves state-of-the-art models for music transcription and source separation, and we release both the system and the dataset as an open-source foundation for future work in the MIR community.
翻译:现代机器学习系统的数据是现代机器学习系统的命脉,包括音乐信息检索系统的数据。然而,MIR长期以来一直被小型数据集和不可靠的标签所困扰。在这项工作中,我们提议使用基因模型打破这一瓶颈。我们建议用一种结构化综合模型,用一个集合室结构化合成模型(在Bach Chorales上受过培训的MIDI-DDSP)来管道一个备注(在URMP上受过培训的MIDI-DSP),我们展示了一个能够产生数量无限的切合实际的带色乐音乐的系统,配有丰富的说明,包括混合、源、MIDI、备注级性能属性(staccto、vibrato等),甚至精细的合成参数(pitch、amplity等)。我们称这个系统为Casemble生成器(CEG),并用它来生成来自4个不同室组合(CocoChorales)的大量数据集。我们证明,利用我们的方法改进了音乐记录和源分离的状态模型,我们释放了MIR的系统,作为未来工作的基础和原始基础。