In this paper, we study the issue of automatic singer identification (SID) in popular music recordings, which aims to recognize who sang a given piece of song. The main challenge for this investigation lies in the fact that a singer's singing voice changes and intertwines with the signal of background accompaniment in time domain. To handle this challenge, we propose the KNN-Net for SID, which is a deep neural network model with the goal of learning local timbre feature representation from the mixture of singer voice and background music. Unlike other deep neural networks using the softmax layer as the output layer, we instead utilize the KNN as a more interpretable layer to output target singer labels. Moreover, attention mechanism is first introduced to highlight crucial timbre features for SID. Experiments on the existing artist20 dataset show that the proposed approach outperforms the state-of-the-art method by 4%. We also create singer32 and singer60 datasets consisting of Chinese pop music to evaluate the reliability of the proposed method. The more extensive experiments additionally indicate that our proposed model achieves a significant performance improvement compared to the state-of-the-art methods.
翻译:在本文中,我们研究流行音乐录音中的自动歌手识别(SID)问题,目的是承认谁唱了一首歌曲。本次调查的主要挑战在于歌手的歌声变化和与时空背景相伴信号的交织。为了应对这一挑战,我们提议SID KNN-Net,这是一个深层神经网络模型,目的是学习歌手声音和背景音乐混合体的本地音调特征。不同于其他使用软模层作为输出层的深神经网络,我们相反地利用KNN作为输出歌手标签的更可解释层。此外,我们首先引入了关注机制,以突出SID的关键音调特征。对现有的艺术家20数据集的实验显示,拟议的方法比目前最先进的方法高出4%。我们还创建了由中国流行音乐组成的歌手32和歌手60数据集,以评价拟议方法的可靠性。更广泛的实验还表明,我们提议的模型与州-艺术方法相比,取得了显著的性能改进。