This paper proposes a novel Transformer-based model for music score infilling, to generate a music passage that fills in the gap between given past and future contexts. While existing infilling approaches can generate a passage that connects smoothly locally with the given contexts, they do not take into account the musical form or structure of the music and may therefore generate overly smooth results. To address this issue, we propose a structure-aware conditioning approach that employs a novel attention-selecting module to supply user-provided structure-related information to the Transformer for infilling. With both objective and subjective evaluations, we show that the proposed model can harness the structural information effectively and generate melodies in the style of pop of higher quality than the two existing structure-agnostic infilling models.
翻译:本文提出了一个新的基于变换器的音乐分数填充模式, 以生成一个能够填补特定过去和今后背景之间差距的音乐通道。 虽然现有填充方法可以产生一段能够顺利地将当地与特定背景连接起来的通道, 但是它们没有考虑到音乐的形式或结构, 因而可能产生过于顺利的结果。 为了解决这个问题, 我们提出了一个结构- 有意识的调控方法, 使用一个新颖的焦点选择模块, 向变换器提供用户提供的与结构有关的信息来填充。 我们通过客观和主观的评价, 显示拟议的模式能够有效地利用结构信息, 产生比两种现有结构- 不可知性填充模型质量高的旋律。