In this work, we propose a permutation invariant language model, SymphonyNet, as a solution for symbolic symphony music generation. We propose a novel Multi-track Multi-instrument Repeatable (MMR) representation for symphonic music and model the music sequence using a Transformer-based auto-regressive language model with specific 3-D positional embedding. To overcome length overflow when modeling extra-long symphony tokens, we also propose a modified Byte Pair Encoding algorithm (Music BPE) for music tokens and introduce a novel linear transformer decoder architecture as a backbone. Meanwhile, we train the decoder to learn automatic orchestration as a joint task by masking instrument information from the input. We also introduce a large-scale symbolic symphony dataset for the advance of symphony generation research. Empirical results show that the proposed approach can generate coherent, novel, complex and harmonious symphony as a pioneer solution for multi-track multi-instrument symbolic music generation.
翻译:在这项工作中,我们提议了一种变异语言模型,即交响乐网,作为象征性交响音乐生成的一种解决办法。我们提议了一种新型的多轨道多功能多工具重复(MMMR)演示,用于交响音乐,并使用基于变异器的自动递增语言模型和3D定位嵌入模式模拟音乐序列。在模拟超长交响符号时,为了克服长溢,我们还提议了一种修改后的音符Byte Pair Encoding算法(BPE 音乐BPE),并引入了一种新型线性变异器解码器结构作为骨干。同时,我们培训解调器学习自动交响器,作为联合任务,从输入中隐藏乐器信息。我们还引入了大规模符号共声数据集,以推进交响新一代研究。经验显示,拟议的方法可以产生连贯、新颖、复杂和和谐的交汇方法,作为多轨多功能象征性音乐生成的先驱解决方案。