对按顺序制作音乐的任务中不同烹饪和中子网格编码方法的比较分析 (An Comparative Analysis of Different Pitch and Metrical Grid Encoding Methods in the Task of Sequential Music Generation)

Pitch and meter are two fundamental music features for symbolic music generation tasks, where researchers usually choose different encoding methods depending on specific goals. However, the advantages and drawbacks of different encoding methods have not been frequently discussed. This paper presents a integrated analysis of the influence of two low-level feature, pitch and meter, on the performance of a token-based sequential music generation model. First, the commonly used MIDI number encoding and a less used class-octave encoding are compared. Second, an dense intra-bar metric grid is imposed to the encoded sequence as auxiliary features. Different complexity and resolutions of the metric grid are compared. For complexity, the single token approach and the multiple token approach are compared; for grid resolution, 0 (ablation), 1 (bar-level), 4 (downbeat-level) 12, (8th-triplet-level) up to 64 (64th-note-grid-level) are compared; for duration resolution, 4, 8, 12 and 16 subdivisions per beat are compared. All different encodings are tested on separately trained Transformer-XL models for a melody generation task. Regarding distribution similarity of several objective evaluation metrics to the test dataset, results suggest that the class-octave encoding significantly outperforms the taken-for-granted MIDI encoding on pitch-related metrics; finer grids and multiple-token grids improve the rhythmic quality, but also suffer from over-fitting at early training stage. Results display a general phenomenon of over-fitting from two aspects, the pitch embedding space and the test loss of the single-token grid encoding. From a practical perspective, we both demonstrate the feasibility and raise the concern of easy over-fitting problem of using smaller networks and lower embedding dimensions on the generation task. The findings can also contribute to futural models in terms of feature engineering.

翻译：Pitch 和仪表是象征性音乐生成任务的两个基本的音乐特性, 研究人员通常根据具体的目标选择不同的编码方法。但是, 不同编码方法的优缺点没有经常讨论。本文对基于象征性的连续音乐生成模型的性能进行综合分析。首先, 常用的 MIDI 数字编码和较少使用的类奥氏编码比较。第二, 将密集的内部标准网格作为辅助性功能, 用于编码的序列中。比较了标准网格的不同复杂性和分辨率。关于复杂性, 单象征性办法和多象征性方法的优缺点没有经常讨论。本文对基于象征性的连续音乐生成模型模型的性能效果进行了综合分析。首先, 常用的 MIDI 数字编码和较少使用的类联名编码, 将时间解码中的4, 8, 12 和 16 亚基码作为辅助性序列。所有不同的编码都用经过单独训练的变压- XL 模型进行测试, 用于细度生成任务。从复杂度的显示网络和多面的显示; 关于网格解的分级版本版本版本的版本的版本, 的版本的版本的版本的版本, 显示, 的版本的版本的版本的版本的版本的版本, 显示, 版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本, 的版本的版本的版本的版本的版本的版本的版本,, 的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本, 的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本的版本