Symbolic music segmentation is the process of dividing symbolic melodies into smaller meaningful groups, such as melodic phrases. We proposed an unsupervised method for segmenting symbolic music. The proposed model is based on an ensemble of temporal prediction error models. During training, each model predicts the next token to identify musical phrase changes. While at test time, we perform a peak detection algorithm to select segment candidates. Finally, we aggregate the predictions of each of the models participating in the ensemble to predict the final segmentation. Results suggest the proposed method reaches state-of-the-art performance on the Essen Folksong dataset under the unsupervised setting when considering F-Score and R-value. We additionally provide an ablation study to better assess the contribution of each of the model components to the final results. As expected, the proposed method is inferior to the supervised setting, which leaves room for improvement in future research considering closing the gap between unsupervised and supervised methods.
翻译:符号音乐分割是将象征性旋律分成较小的有意义组别的过程,例如旋律短语。 我们建议了一种不受监督的符号音乐分割方法。 提议的模型基于一个时间预测错误模型的组合。 培训期间, 每个模型预测下一个符号以识别音乐词句变化。 在测试时, 我们使用一个峰值检测算法来选择分区候选人。 最后, 我们汇总了参与组合的每个模型的预测, 以预测最终分割。 结果表明, 在考虑F- Score和R- value时, 拟议的方法达到了Essen Folksong数据集在未经监督的设置下的最新性能 。 我们还提供额外的减缩研究, 以更好地评估每个模型组成部分对最终结果的贡献。 正如预期的那样, 拟议的方法比监督环境低, 这为今后的研究提供了改进空间, 考虑缩小未监督和监督方法之间的差距。