Lyric-to-melody generation is an important task in songwriting, and is also quite challenging due to its distinctive characteristics: the generated melodies should not only follow good musical patterns, but also align with features in lyrics such as rhythms and structures. These characteristics cannot be well handled by neural generation models that learn lyric-to-melody mapping in an end-to-end way, due to several issues: (1) lack of aligned lyric-melody training data to sufficiently learn lyric-melody feature alignment; (2) lack of controllability in generation to explicitly guarantee the lyric-melody feature alignment. In this paper, we propose Re-creation of Creations (ROC), a new paradigm for lyric-to-melody generation that addresses the above issues through a generation-retrieval pipeline. Specifically, our paradigm has two stages: (1) creation stage, where a huge amount of music pieces are generated by a neural-based melody language model and indexed in a database through several key features (e.g., chords, tonality, rhythm, and structural information including chorus or verse); (2) re-creation stage, where melodies are recreated by retrieving music pieces from the database according to the key features from lyrics and concatenating best music pieces based on composition guidelines and melody language model scores. Our new paradigm has several advantages: (1) It only needs unpaired melody data to train melody language model, instead of paired lyric-melody data in previous models. (2) It achieves good lyric-melody feature alignment in lyric-to-melody generation. Experiments on English and Chinese datasets demonstrate that ROC outperforms previous neural based lyric-to-melody generation models on both objective and subjective metrics.
翻译:音频和结构是歌曲写作的一项重要任务,而且由于其独特的特点,它也具有相当的挑战性:制作的旋律不仅应该遵循良好的音乐模式,而且应该与歌词中诸如节奏和结构等特点相一致。这些特点不能由以端到端方式学习音频到冶炼绘图的神经生成模型来很好地处理。具体地说,我们的模式有两个阶段:(1) 创建阶段,其中大量音乐作品是由基于神经的模型式曲调语言模型生成的,在数据库中仅通过若干关键特征(e.g. 曲式、调味、节律和结构风格语言)来明确保证曲调调调调。在本文件中,我们提议重新制作创作品系(ROC),这是通过一代到熔化管道管道管道管道来解决上述问题的一个新的模式。具体地说,我们的模式有两个阶段:(1) 创建阶段,其中大量的音乐是由基于神经模型的体旋律语言模型生成的,并且通过几个关键特征(e.g.c. 曲式、调调调调、节律、节律和结构等语言的功能,其中既有是前制动、制动、制、制版、制动、制动、制动、制动、制式数据系、制成、制成、制成、制动、制动、制动、制动、制动、制动、制动、制动、制动、制动、制动、制动、制动、制动、制成数据、制成、制、制成、制成、制动、制动、制动、制动、制成、制动、制动、制动、制动、制动、制成、制动、制动、制成、制成、制动、制成、制动、制成、制成、制成、制成、制成、制动、制、制、制成、制、制成、制、制、制、制、制、制、制、制、制成、制动、制成、制成、制成、制成、制成、制成、制成、制、制、制、制、制、制、制动、制、制、制、制、制、制动、制、制、制、制成、制、制、制、制、制、制、制