Performance-score synchronization is an integral task in signal processing, which entails generating an accurate mapping between an audio recording of a performance and the corresponding musical score. Traditional synchronization methods compute alignment using knowledge-driven and stochastic approaches, and are typically unable to generalize well to different domains and modalities. We present a novel data-driven method for structure-aware performance-score synchronization. We propose a convolutional-attentional architecture trained with a custom loss based on time-series divergence. We conduct experiments for the audio-to-MIDI and audio-to-image alignment tasks pertained to different score modalities. We validate the effectiveness of our method via ablation studies and comparisons with state-of-the-art alignment approaches. We demonstrate that our approach outperforms previous synchronization methods for a variety of test settings across score modalities and acoustic conditions. Our method is also robust to structural differences between the performance and score sequences, which is a common limitation of standard alignment approaches.
翻译:性能核心同步是信号处理中的一项不可或缺的任务,它需要精确地绘制性能记录和相应的乐谱之间的音频记录。传统的同步方法使用知识驱动和随机方法计算一致性,通常无法对不同的领域和模式进行概括化。我们为结构认知性性能核心同步提供了一种新的数据驱动方法。我们提出了一个根据时间序列差异进行定制损失培训的渐进式注意性架构。我们为与不同得分模式有关的声到MIDI和声到图像协调任务进行实验。我们通过对正态的校准方法进行校验和比较来验证我们的方法的有效性。我们证明我们的方法超越了以往不同得分模式和声学条件下的各种测试环境的同步方法。我们的方法还能够应对性能和分数序列之间的结构性差异,这是标准一致性方法的共同限制。