Music structure analysis (MSA) systems aim to segment a song recording into non-overlapping sections with useful labels. Previous MSA systems typically predict abstract labels in a post-processing step and require the full context of the song. By contrast, we recently proposed a supervised framework, called "Music Structural Function Analysis" (MuSFA), that models and predicts meaningful labels like 'verse' and 'chorus' directly from audio, without requiring the full context of a song. However, the performance of this system depends on the amount and quality of training data. In this paper, we propose to repurpose a public dataset, HookTheory Lead Sheet Dataset (HLSD), to improve the performance. HLSD contains over 18K excerpts of music sections originally collected for studying automatic melody harmonization. We treat each excerpt as a partially labeled song and provide a label mapping, so that HLSD can be used together with other public datasets, such as SALAMI, RWC, and Isophonics. In cross-dataset evaluations, we find that including HLSD in training can improve state-of-the-art boundary detection and section labeling scores by ~3% and ~1% respectively.
翻译:音乐结构分析(MSA)系统旨在将歌曲录音分解成带有用标签的非重叠部分。 以前的管理事务协议系统通常在后处理步骤中预测抽象标签,并需要这首歌的完整背景。 相反,我们最近提议了一个监管框架,称为“音乐结构功能分析”( MuSFA),这个框架的模型和预测有意义的标签直接来自音响,如“ wvers”和“合唱”,而不需要一首歌的全部背景。然而,这个系统的性能取决于培训数据的数量和质量。在本文件中,我们提议重新使用一个公共数据集,即 HookThery铅板数据集(HLSD),以改善性能。 HLSD包含18K 多个音乐部分的摘录,最初收集这些摘录是为了研究自动旋律协调。我们把每节选作部分标签歌曲,并提供标签图谱,以便HLSD可以与其他公共数据集一起使用, 如 SALAMI、 RWC 和Isophonics。 在交叉数据评估中,我们发现, 将HLSD纳入培训中可以改进州- 区域% 边界探测和标签分计分。