Real-time music accompaniment generation has a wide range of applications in the music industry, such as music education and live performances. However, automatic real-time music accompaniment generation is still understudied and often faces a trade-off between logical latency and exposure bias. In this paper, we propose SongDriver, a real-time music accompaniment generation system without logical latency nor exposure bias. Specifically, SongDriver divides one accompaniment generation task into two phases: 1) The arrangement phase, where a Transformer model first arranges chords for input melodies in real-time, and caches the chords for the next phase instead of playing them out. 2) The prediction phase, where a CRF model generates playable multi-track accompaniments for the coming melodies based on previously cached chords. With this two-phase strategy, SongDriver directly generates the accompaniment for the upcoming melody, achieving zero logical latency. Furthermore, when predicting chords for a timestep, SongDriver refers to the cached chords from the first phase rather than its previous predictions, which avoids the exposure bias problem. Since the input length is often constrained under real-time conditions, another potential problem is the loss of long-term sequential information. To make up for this disadvantage, we extract four musical features from a long-term music piece before the current time step as global information. In the experiment, we train SongDriver on some open-source datasets and an original \`aiSong Dataset built from Chinese-style modern pop music scores. The results show that SongDriver outperforms existing SOTA (state-of-the-art) models on both objective and subjective metrics, meanwhile significantly reducing the physical latency.
翻译:实时音乐伴奏生成在音乐产业中具有广泛的应用范围, 比如音乐教育和现场表演。 但是, 自动实时音乐伴奏生成仍然未得到充分研究, 并且往往面临逻辑拉伸和暴露偏差之间的权衡。 在本文中, 我们提议SongDriver, 一个实时音乐伴奏生成系统, 没有逻辑拉伸或接触偏差。 具体地说, SongDriver 将一个伴奏生成任务分为两个阶段:(1) 安排阶段, 一个变异模型首先安排音动节奏的音阶在实时中进行输入步调, 而自动音乐伴奏生成的音乐伴奏, 并且隐藏下个阶段的音乐伴奏。 预测阶段, 一个通用报告格式模型为即将到来的旋律生成可玩玩的多轨相配奏。 由两阶段战略 SongDriver 直接为即将到的旋伴奏曲伴奏的伴奏生成一个相伴奏, 实现零逻辑拉伸缩。 此外, 在预言中, 预言到当前时段之前的曲节段中,, SongDriver 目标在前的预言程中, 显示一个预言变变变的预变变。