Recent advancements in generative models have shown remarkable progress in music generation. However, most existing methods focus on generating monophonic or homophonic music, while the generation of polyphonic and multi-track music with rich attributes is still a challenging task. In this paper, we propose a novel approach for multi-track, multi-attribute symphonic music generation using the diffusion model. Specifically, we generate piano-roll representations with a diffusion model and map them to MIDI format for output. To capture rich attribute information, we introduce a color coding scheme to encode note sequences into color and position information that represents pitch,velocity, and instrument. This scheme enables a seamless mapping between discrete music sequences and continuous images. We also propose a post-processing method to optimize the generated scores for better performance. Experimental results show that our method outperforms state-of-the-art methods in terms of polyphonic music generation with rich attribute information compared to the figure methods.
翻译:基因模型的最近进步显示了音乐生成的显著进步。 但是,大多数现有方法都侧重于生成单声或同声音乐,而生成具有丰富属性的多声和多轨音乐仍是一项艰巨的任务。 在本文中,我们提出了使用扩散模型的多轨、多属性交响音乐生成新颖办法。 具体地说, 我们用一个扩散模型生成钢琴滚动演示, 并将它们映射为 MIDI 格式输出。 为了获取丰富的属性信息, 我们引入了一种颜色编码方案, 将音符序列编码为代表音量、 速度和仪器的颜色和位置信息。 这个方案使得离散音乐序列和连续图像之间能够进行无缝的绘图。 我们还提出了一种后处理方法, 优化生成的分数, 以便提高性能。 实验结果表明, 我们的方法在多声频音乐生成中, 与图形方法相比, 具有丰富的属性信息, 超越了最新艺术方法的形状。</s>