Attention-based Transformer models have been increasingly employed for automatic music generation. To condition the generation process of such a model with a user-specified sequence, a popular approach is to take that conditioning sequence as a priming sequence and ask a Transformer decoder to generate a continuation. However, this prompt-based conditioning cannot guarantee that the conditioning sequence would develop or even simply repeat itself in the generated continuation. In this paper, we propose an alternative conditioning approach, called theme-based conditioning, that explicitly trains the Transformer to treat the conditioning sequence as a thematic material that has to manifest itself multiple times in its generation result. This is achieved with two main technical contributions. First, we propose a deep learning-based approach that uses contrastive representation learning and clustering to automatically retrieve thematic materials from music pieces in the training data. Second, we propose a novel gated parallel attention module to be used in a sequence-to-sequence (seq2seq) encoder/decoder architecture to more effectively account for a given conditioning thematic material in the generation process of the Transformer decoder. We report on objective and subjective evaluations of variants of the proposed Theme Transformer and the conventional prompt-based baseline, showing that our best model can generate, to some extent, polyphonic pop piano music with repetition and plausible variations of a given condition.
翻译:以关注为基础的变异器模型已越来越多地用于自动音乐生成。为了让这种模型的生成过程以用户指定的顺序为条件,一种流行的方法是将这种变异序列作为初始序列,并请求变异器解码器生成连续性。然而,这种快速的变异器无法保证变异序列在生成的延续中发展,甚至只是重复。在本文中,我们提议了一种替代的调制方法,称为基于主题的调节器,明确训练变异器将调制序列作为主题材料处理,在变异器的生成过程中必须多次表现出来。这是通过两个主要技术贡献实现的。首先,我们提出一种深层次的学习法,使用对比式的代谢式学习和组合来自动从培训数据中的音乐片中检索专题材料。第二,我们提议了一种新型的平行关注模块,用于从顺序到顺序(seq2sseqeq)的变异器/变形器结构,以便更有效地考虑到变异式的主题材料在变异器解变形器的生成过程中必须多次显现出来。我们报告对变式进行客观和主观评价,并显示我们的拟议主题变式的变式的变形和最接近的硬的钢琴状况。