Beat and downbeat tracking models have improved significantly in recent years with the introduction of deep learning methods. However, despite these improvements, several challenges remain. Particularly, the adaptation of available models to underrepresented music traditions in MIR is usually synonymous with collecting and annotating large amounts of data, which is impractical and time-consuming. Transfer learning, data augmentation, and fine-tuning techniques have been used quite successfully in related tasks and are known to alleviate this bottleneck. Furthermore, when studying these music traditions, models are not required to generalize to multiple mainstream music genres but to perform well in more constrained, homogeneous conditions. In this work, we investigate simple yet effective strategies to adapt beat and downbeat tracking models to two different Latin American music traditions and analyze the feasibility of these adaptations in real-world applications concerning the data and computational requirements. Contrary to common belief, our findings show it is possible to achieve good performance by spending just a few minutes annotating a portion of the data and training a model in a standard CPU machine, with the precise amount of resources needed depending on the task and the complexity of the dataset.
翻译:近年来,随着深度学习方法的引入,节拍和下拍跟踪模型有了显著的改进。然而,尽管有了这些进步,仍然存在一些挑战。特别是,将现有模型适应于MIR中未充分代表的音乐传统通常意味着需要收集和注释大量数据,这是不切实际和耗时的。迁移学习、数据增强和微调技术在相关任务中已经被相当成功地使用,并且已知可以缓解这个瓶颈。此外,在研究这些音乐传统时,模型不需要广泛适用于多种主流音乐类型,而是需要在更为受限制和同质化的条件下表现良好。在这项工作中,我们研究了适应两种不同的拉丁美洲音乐传统的节拍和下拍跟踪模型的简单而有效的策略,并分析了这些适应在数据和计算需求方面在实际应用中的可行性。与普遍认为的相反,我们的研究结果显示,通过花费几分钟时间注释一部分数据并在标准CPU机器上训练模型,即使需要的资源量取决于任务和数据集的复杂程度,也可以实现良好的性能。