Commonly music has an obvious hierarchical structure, especially for the singing parts which usually act as the main melody in pop songs. However, most of the current singing annotation datasets only record symbolic information of music notes, ignoring the structure of music. In this paper, we propose a hierarchical singing annotation dataset that consists of 68 pop songs from Youtube. This dataset records the onset/offset time, pitch, duration, and lyric of each musical note in an enhanced LyRiCs format to present the hierarchical structure of music. We annotate each song in a two-stage process: first, create initial labels with the corresponding musical notation and lyrics file; second, manually calibrate these labels referring to the raw audio. We mainly validate the labeling accuracy of the proposed dataset by comparing it with an automatic singing transcription (AST) dataset. The result indicates that the proposed dataset reaches the labeling accuracy of AST datasets.
翻译:普通的音乐有一个明显的等级结构, 特别是通常作为流行歌曲主要旋律的歌唱部分。 然而, 多数当前的歌唱注释数据集只记录音乐音符的符号信息, 忽略音乐结构 。 在本文中, 我们提议一个由 Youtube 的68个流行歌曲组成的歌唱注释数据集。 这个数据集记录每个音乐音符的开始/ 取消时间、 音道、 持续时间和歌词, 以增强的 LyRiCs 格式来显示音乐的等级结构 。 我们用两个阶段的过程来评分每首歌: 首先, 创建首个带有相应的音乐记号和歌词文件的首个标签; 第二, 手动校准这些标签, 指的是原始音频。 我们主要通过比较自动歌唱转录( AST) 数据集来验证拟议数据集的准确性。 结果显示, 提议的数据集达到了 AST 数据集的标签准确性 。