Audio-to-score alignment (A2SA) is a multimodal task consisting in the alignment of audio signals to music scores. Recent literature confirms the benefits of Automatic Music Transcription (AMT) for A2SA at the frame-level. In this work, we aim to elaborate on the exploitation of AMT Deep Learning (DL) models for achieving alignment at the note-level. We propose a method which benefits from HMM-based score-to-score alignment and AMT, showing a remarkable advancement beyond the state-of-the-art. We design a systematic procedure to take advantage of large datasets which do not offer an aligned score. Finally, we perform a thorough comparison and extensive tests on multiple datasets.
翻译:音频对子校对(A2SA)是一项多式联运任务,包括将音频信号与音乐评分相匹配。最近的文献证实了自动音乐分解(AMT)在框架一级对A2SA的好处。在这项工作中,我们旨在详细说明如何利用AMT深层学习(DL)模型实现注释一级的对齐。我们提出了一种方法,从基于 HMM 的得分对分调整(A2SA)和AMT(AMT)中受益,这显示出超越最新水平的显著进步。我们设计了一个系统程序,利用不提供一致得分的大型数据集。最后,我们对多个数据集进行了彻底的比较和广泛的测试。