Extraction of the predominant pitch from polyphonic audio is one of the fundamental tasks in the field of music information retrieval and computational musicology. To accomplish this task using machine learning, a large amount of labeled audio data is required to train the model that predicts the pitch contour. But a classical model pre-trained on data from one domain (source), e.g, songs of a particular singer or genre, may not perform comparatively well in extracting melody from other domains (target). The performance of such models can be boosted by adapting the model using some annotated data in the target domain. In this work, we study various adaptation techniques applied to machine learning models for polyphonic melody extraction. Experimental results show that meta-learning-based adaptation performs better than simple fine-tuning. In addition to this, we find that this method outperforms the existing state-of-the-art non-adaptive polyphonic melody extraction algorithms.
翻译:从多音音频中提取主要音质是音乐信息检索和计算音乐学领域的基本任务之一。 为了通过机器学习完成这项任务,需要大量标签的音频数据来培训预测音质轮廓的模型。 但是,从一个领域(来源)(例如,某个歌手或某一流派的歌曲)对一个数据进行古典模型的预先培训,在从其它领域(目标)提取旋律方面可能表现较差。通过在目标领域使用一些附加说明的数据对模型进行调整,可以提高这些模型的性能。在这项工作中,我们研究多种适应技术,用于多声调提取的机器学习模型。实验结果显示,基于元学习的适应比简单的微调要好。此外,我们发现,这一方法比现有的最先进的非适应性多音调调调调调调调制算法要好。