Creating a complex work of art like music necessitates profound creativity. With recent advancements in deep learning and powerful models such as transformers, there has been huge progress in automatic music generation. In an accompaniment generation context, creating a coherent drum pattern with apposite fills and improvisations at proper locations in a song is a challenging task even for an experienced drummer. Drum beats tend to follow a repetitive pattern through stanzas with fills or improvisation at section boundaries. In this work, we tackle the task of drum pattern generation conditioned on the accompanying music played by four melodic instruments: Piano, Guitar, Bass, and Strings. We use the transformer sequence to sequence model to generate a basic drum pattern conditioned on the melodic accompaniment to find that improvisation is largely absent, attributed possibly to its expectedly relatively low representation in the training data. We propose a novelty function to capture the extent of improvisation in a bar relative to its neighbors. We train a model to predict improvisation locations from the melodic accompaniment tracks. Finally, we use a novel BERT-inspired in-filling architecture, to learn the structure of both the drums and melody to in-fill elements of improvised music.
翻译:创建像音乐这样的复杂艺术作品需要深刻的创造力。 随着最近深层学习和强大的模型(如变压器)的进步, 自动音乐生成取得了巨大的进步。 在伴奏的一代中, 创建一个和谐的鼓式, 配有相配的填充和即兴在合适的歌曲中创作是一个艰巨的任务, 即使对于有经验的鼓手来说也是如此。 Drum 的拍打往往会通过在节界内填充或即兴制作的节奏来遵循一种重复模式。 在这项工作中, 我们处理鼓式生成的任务以四种旋律仪器( 即钢琴、 吉他、 巴斯 和 Strings ) 的伴奏音乐为条件。 我们用变音器序列模型来生成一个基本的鼓式, 以旋律配音为条件, 来发现即使旋律配音基本上不存在, 这可能是由于在培训数据中的预期代表性较低。 我们提议了一个新功能, 来捕捉到酒吧内与邻居的即时情调的程度。 我们训练一个模型, 来预测从旋调调的节奏的节奏结构中, 和制结构中的新式的补。 我们使用了新式的平时制结构, 的补式的补制结构。