The symbolic music modality is nowadays mostly represented as discrete and used with sequential models such as Transformers, for deep learning tasks. Recent research put efforts on the tokenization, i.e. the conversion of data into sequences of integers intelligible to such models. This can be achieved by many ways as music can be composed of simultaneous tracks, of simultaneous notes with several attributes. Until now, the proposed tokenizations are based on small vocabularies describing the note attributes and time events, resulting in fairly long token sequences. In this paper, we show how Byte Pair Encoding (BPE) can improve the results of deep learning models while improving its performances. We experiment on music generation and composer classification, and study the impact of BPE on how models learn the embeddings, and show that it can help to increase their isotropy, i.e., the uniformity of the variance of their positions in the space.
翻译:目前,象征性的音乐模式主要表现为离散的,并且与诸如变异器等相继模型一起用于深层学习任务。最近的研究努力将数据转换成可理解到这些模型的整数序列。这可以通过音乐可以由同时的音轨组成,同时的音符和若干属性的同声音符等多种方式实现。到目前为止,拟议的代号是基于描述备注属性和时间事件的小词汇,从而产生相当长的象征序列。在本文中,我们展示了Byte Pair Encoding(BPE)如何在改进其性能的同时改进深层学习模型的结果。我们实验了音乐生成和作曲者分类,并研究了BPE如何使模型学会嵌入,并表明它能够帮助增加它们的分解状态,即空间位置差异的统一性。