In this paper, we explore the tokenized representation of musical scores using the Transformer model to automatically generate musical scores. Thus far, sequence models have yielded fruitful results with note-level (MIDI-equivalent) symbolic representations of music. Although the note-level representations can comprise sufficient information to reproduce music aurally, they cannot contain adequate information to represent music visually in terms of notation. Musical scores contain various musical symbols (e.g., clef, key signature, and notes) and attributes (e.g., stem direction, beam, and tie) that enable us to visually comprehend musical content. However, automated estimation of these elements has yet to be comprehensively addressed. In this paper, we first design score token representation corresponding to the various musical elements. We then train the Transformer model to transcribe note-level representation into appropriate music notation. Evaluations of popular piano scores show that the proposed method significantly outperforms existing methods on all 12 musical aspects that were investigated. We also explore an effective notation-level token representation to work with the model and determine that our proposed representation produces the steadiest results.
翻译:在本文中,我们探索音乐分数的象征性表示方式,使用变换模型自动产生音乐分数。到目前为止,序列模型已经以笔记级(MIDI等值)象征性的音乐表示方式取得了丰硕的成果。虽然注级表示方式可以包含足够的信息,可以复制音乐的音响,但是它们不能包含足够的信息,用符号表示音乐的视觉。音乐分数包含各种音乐符号(例如,括号、关键符号和音符)和属性(例如,干方向、梁和领带),使我们能够直观理解音乐内容。然而,对这些元素的自动估计尚未得到全面处理。在本文中,我们首先设计与各种音乐元素相对应的记号表示方式。然后我们训练变换式模型将笔记级表示方式转换为适当的音乐记号。流行钢琴分数评价显示,拟议的方法大大超越了所调查的所有12个音乐方面的现有方法。我们还探索一种有效的记号表示方式,以便与模型合作并确定我们提议的表示方式产生最稳定的结果。