Whereas chord transcription has received considerable attention during the past couple of decades, far less work has been devoted to transcribing and encoding the rhythmic patterns that occur in a song. The topic is especially relevant for instruments such as the rhythm guitar, which is typically played by strumming rhythmic patterns that repeat and vary over time. However, in many cases one cannot objectively define a single "right" rhythmic pattern for a given song section. To create a dataset with well-defined ground-truth labels, we asked expert musicians to transcribe the rhythmic patterns in 410 popular songs and record cover versions where the guitar tracks followed those transcriptions. To transcribe the strums and their corresponding rhythmic patterns, we propose a three-step framework. Firstly, we perform approximate stem separation to extract the guitar part from the polyphonic mixture. Secondly, we detect individual strums within the separated guitar audio, using a pre-trained foundation model (MERT) as a backbone. Finally, we carry out a pattern-decoding process in which the transcribed sequence of guitar strums is represented by patterns drawn from an expert-curated vocabulary. We show that it is possible to transcribe the rhythmic patterns of the guitar track in polyphonic music with quite high accuracy, producing a representation that is human-readable and includes automatically detected bar lines and time signature markers. We perform ablation studies and error analysis and propose a set of evaluation metrics to assess the accuracy and readability of the predicted rhythmic pattern sequence.
翻译:尽管和弦转录在过去几十年中受到了广泛关注,但针对歌曲中节奏模式的转录与编码研究却相对较少。该课题对于节奏吉他等乐器尤为重要,这类乐器通常通过弹奏随时间重复变化的节奏模式来演奏。然而,在许多情况下,我们无法为给定歌曲段落客观定义唯一的“正确”节奏模式。为创建具有明确定义真值标签的数据集,我们邀请专业音乐家对410首流行歌曲的节奏模式进行转录,并录制了吉他声部严格遵循这些转录谱的翻奏版本。为转录扫弦及其对应的节奏模式,我们提出了一个三步框架:首先,通过近似音轨分离从复调混合音频中提取吉他部分;其次,使用预训练基础模型(MERT)作为主干网络,在分离后的吉他音频中检测单个扫弦事件;最后,执行模式解码过程,将转录的吉他扫弦序列用专家编纂词汇表中的节奏模式进行表征。研究表明,以较高准确度转录复调音乐中吉他声部的节奏模式是可行的,所生成的表征具备人类可读性,并包含自动检测的小节线和拍号标记。我们通过消融实验与误差分析,提出了一套评估预测节奏模式序列准确性与可读性的指标体系。