Subword tokenization has been widely successful in text-based natural language processing (NLP) tasks with Transformer-based models. As Transformer models become increasingly popular in symbolic music-related studies, it is imperative to investigate the efficacy of subword tokenization in the symbolic music domain. In this paper, we explore subword tokenization techniques, such as byte-pair encoding (BPE), in symbolic music generation and its impact on the overall structure of generated songs. Our experiments are based on three types of MIDI datasets: single track-melody only, multi-track with a single instrument, and multi-track and multi-instrument. We apply subword tokenization on post-musical tokenization schemes and find that it enables the generation of longer songs at the same time and improves the overall structure of the generated music in terms of objective metrics like structure indicator (SI), Pitch Class Entropy, etc. We also compare two subword tokenization methods, BPE and Unigram, and observe that both methods lead to consistent improvements. Our study suggests that subword tokenization is a promising technique for symbolic music generation and may have broader implications for music composition, particularly in cases involving complex data such as multi-track songs.
翻译:基于变压器模型的子词分词技术在自然语言处理 (NLP) 中取得了广泛的成功。随着变压器模型在符号音乐相关研究中的日益普及,研究子词分词技术在符号音乐领域中的有效性变得非常重要。本文探讨了子词分词技术,例如字节对编码 (BPE),在符号音乐生成中的应用,以及其对生成的歌曲结构的整体影响。我们的实验基于三种类型的 MIDI 数据集:仅包含单轨旋律、带单个乐器的多轨、带多轨与多个乐器。我们在音乐tokenization的基础之上应用了子词分词,并发现它可以在相同的时间内生成更长的歌曲,同时在生成的音乐的整体结构方面,如结构指标 (SI),音高类熵等客观指标上有所改善。我们还比较了两种子词分词方法,BPE和Unigram,并观察到两种方法都能够提高音乐的生成质量。我们的研究表明,子词分词技术对于符号音乐生成来说是一种非常有前途的技术。在音乐构成方面,特别是涉及多轨复杂数据的情况下,它可能具有更广泛的意义。