Singing voice detection (SVD), to recognize vocal parts in the song, is an essential task in music information retrieval (MIR). The task remains challenging since singing voice varies and intertwines with the accompaniment music, especially for some complicated polyphonic music such as choral music recordings. To address this problem, we investigate singing voice detection while discarding the interference from the accompaniment. The proposed SVD has two steps: i. The singing voice separation (SVS) technique is first utilized to filter out the singing voice's potential part coarsely. ii. Upon the continuity of vocal in the time domain, Long-term Recurrent Convolutional Networks (LRCN) is used to learn compositional features. Moreover, to eliminate the outliers, we choose to use a median filter for time-domain smoothing. Experimental results show that the proposed method outperforms the existing state-of-the-art works on two public datasets, the Jamendo Corpus and the RWC pop dataset.
翻译:歌声探测(SVD)是音乐信息检索(MIR)的一项基本任务。任务仍然是艰巨的,因为歌声声音与伴奏音乐之间互不相同,特别是一些复杂的多声音乐,如合唱音乐录音。为了解决这个问题,我们调查歌声探测,同时放弃伴奏音乐的干扰。提议的SVD有两个步骤:一. 歌声分离(SVS)技术首先被粗略地用来过滤歌声声音的潜在部分。二. 在时间域内,随着声音的连续性,长期的变动网络(LRCN)被用来学习创作特征。此外,为了消除外缘,我们选择使用中位过滤器来保持时段平稳。实验结果显示,拟议的方法超越了两个公共数据集,即Jamendo Corpus和RWC流行数据集的现有状态。