Following the success of the transformer architecture in the natural language domain, transformer-like architectures have been widely applied to the domain of symbolic music recently. Symbolic music and text, however, are two different modalities. Symbolic music contains multiple attributes, both absolute attributes (e.g., pitch) and relative attributes (e.g., pitch interval). These relative attributes shape human perception of musical motifs. These important relative attributes, however, are mostly ignored in existing symbolic music modeling methods with the main reason being the lack of a musically-meaningful embedding space where both the absolute and relative embeddings of the symbolic music tokens can be efficiently represented. In this paper, we propose the Fundamental Music Embedding (FME) for symbolic music based on a bias-adjusted sinusoidal encoding within which both the absolute and the relative attributes can be embedded and the fundamental musical properties (e.g., translational invariance) are explicitly preserved. Taking advantage of the proposed FME, we further propose a novel attention mechanism based on the relative index, pitch and onset embeddings (RIPO attention) such that the musical domain knowledge can be fully utilized for symbolic music modeling. Experiment results show that our proposed model: RIPO transformer which utilizes FME and RIPO attention outperforms the state-of-the-art transformers (i.e., music transformer, linear transformer) in a melody completion task. Moreover, using the RIPO transformer in a downstream music generation task, we notice that the notorious degeneration phenomenon no longer exists and the music generated by the RIPO transformer outperforms the music generated by state-of-the-art transformer models in both subjective and objective evaluations.
翻译:在自然语言域变压器结构成功后, 类似变压器的结构最近被广泛应用于象征性音乐领域。 符号音乐和文字是两种不同的模式。 符号音乐包含多种属性, 包括绝对属性( 如音道) 和相对属性( 如音道间隔 ) 。 这些相对属性决定了人类对音乐motifs 的感知。 然而, 这些重要的相对属性大多被现有象征性音乐建模方法所忽略, 主要原因是缺少一个具有音乐意义的嵌入空间, 在那里象征性音乐符号的绝对和相对嵌入可以有效地代表下流。 在本文中, 我们提议, 基础音乐嵌入( FME) 基础音乐嵌入( FMEE), 用于基于偏差调整的音乐调制成调制成调制成调制成调制成调制成调制成调制成调制成调制成调制成。 借助FME 的变压变压变压器, 我们进一步提议, 以相对指数、 投制和开始嵌入式的变压式变压变压器( RIPO ), 这样的变压式变压式变压式变压变压式变压式变制成的变压式变制成的变制成制成制成制成制成制成制制制成。