Melody extraction is a vital music information retrieval task among music researchers for its potential applications in education pedagogy and the music industry. Melody extraction is a notoriously challenging task due to the presence of background instruments. Also, often melodic source exhibits similar characteristics to that of the other instruments. The interfering background accompaniment with the vocals makes extracting the melody from the mixture signal much more challenging. Until recently, classical signal processing-based melody extraction methods were quite popular among melody extraction researchers. The ability of the deep learning models to model large-scale data and the ability of the models to learn automatic features by exploiting spatial and temporal dependencies inspired many researchers to adopt deep learning models for melody extraction. In this paper, an attempt has been made to review the up-to-date data-driven deep learning approaches for melody extraction from polyphonic music. The available deep models have been categorized based on the type of neural network used and the output representation they use for predicting melody. Further, the architectures of the 25 melody extraction models are briefly presented. The loss functions used to optimize the model parameters of the melody extraction models are broadly categorized into four categories and briefly describe the loss functions used by various melody extraction models. Also, the various input representations adopted by the melody extraction models and the parameter settings are deeply described. A section describing the explainability of the block-box melody extraction deep neural networks is included. The performance of 25 melody extraction methods is compared. The possible future directions to explore/improve the melody extraction methods are also presented in the paper.
翻译:在音乐研究人员中,Melody提取是一个至关重要的音乐信息检索任务。Melody提取由于有背景仪器的存在而是一项臭名昭著的挑战性任务。此外,经常的旋律源与其它仪器具有相似的特点。干扰背景的结合使得从混合物信号中提取旋律更具挑战性。直到最近,经典信号处理制旋律提取方法在旋律提取研究人员中非常受欢迎。深层次学习模型模拟大型数据的能力以及模型通过利用空间和时间依赖来学习自动特征的能力,激励许多研究人员采用深层学习模型进行旋律提取。在本文中,试图审查最新数据驱动的深层次学习方法,以便从混合曲调中提取旋律信号。根据使用的神经网络类型和它们用来预测旋律的输出表示方式,现有深层次模型的分类。此外,25调调调调调制模型的架构通过利用时间和时间依赖时间来进行简要比较,激励许多研究人员采用深层学习模型,采用深层次学习模型来采用深层次的数学模型。在模型中所使用的损失函数是模型的缩缩缩缩缩图解的缩缩缩缩图解的缩图解图解。此外,还使用了各种提取图解的缩图解的缩图解的缩缩缩图解的缩图解方法。