In this work, we present a symbolic symphony music generation solution, SymphonyNet, based on a permutation invariant language model. To bridge the gap between text generation and symphony generation task, we propose a novel Multi-track Multi-instrument Repeatable (MMR) representation with particular 3-D positional embedding and a modified Byte Pair Encoding algorithm (Music BPE) for music tokens. A novel linear transformer decoder architecture is introduced as a backbone for modeling extra-long sequences of symphony tokens. Meanwhile, we train the decoder to learn automatic orchestration as a joint task by masking instrument information from the input. We also introduce a large-scale symbolic symphony dataset for the advance of symphony generation research. Our empirical results show that our proposed approach can generate coherent, novel, complex and harmonious symphony compared to human composition, which is the pioneer solution for multi-track multi-instrument symbolic music generation.
翻译:在这项工作中,我们展示了一种象征性的交响曲音乐生成解决方案,即基于变异语言模型的交响乐网。为了缩小文本生成和交响曲生成任务之间的差距,我们提议了一个新的多轨多轨多语种重复教学(MMR)代表,其中特别包括3D定位嵌入和修改的音乐符号Byte Pair Encoding算法(BPE 音乐)。引入了一个新颖的线性变压器解码器结构,作为模拟超长顺序的交响乐符号的支柱。同时,我们培训解码器学习自动调弦,作为从输入中隐藏仪器信息的一项共同任务。我们还为交响乐生成研究的推进引入了大规模象征性的交响语数据集。我们的经验结果表明,我们拟议的方法可以产生与人类构成的一致、新颖、复杂和和谐的交响乐,而人类构成是多轨多轨制象征性音乐生成的先驱解决方案。