Separation of multiple singing voices into each voice is a rarely studied area in music source separation research. The absence of a benchmark dataset has hindered its progress. In this paper, we present an evaluation dataset and provide baseline studies for multiple singing voices separation. First, we introduce MedleyVox, an evaluation dataset for multiple singing voices separation that corresponds to such categories. We specify the problem definition in this dataset by categorizing the problem into i) duet, ii) unison, iii)main vs. rest, and iv) N-singing separation. Second, we present a strategy for construction of multiple singing mixtures using various single-singing datasets. This can be used to obtain training data. Third, we propose the improved super-resolution network (iSRNet). Jointly trained with the Conv-TasNet and the multi-singing mixture construction strategy, the proposed iSRNet achieved comparable performance to ideal time-frequency masks on duet and unison subsets of MedleyVox. Audio samples, the dataset, and codes are available on our GitHub page (https://github.com/jeonchangbin49/MedleyVox).
翻译:在音乐源分离研究中,将多个歌声分离到每个声音中是一个很少研究的领域。 缺少基准数据集阻碍了它的进展。 在本文中, 我们提出了一个评估数据集, 并为多个歌声分离提供基准研究。 首先, 我们引入了MedleyVox, 用于多个歌声分离的评价数据集, 与这些类别相对应的多个歌声分离。 我们通过将问题分类到i) 平方, (ii) 共振, (iii) 共振, (iii) main vs. rest and iv) N- sing 分离, 来具体说明该数据集中的问题定义。 第二, 我们提出了一个使用各种单声数据集构建多个歌曲混合物的战略。 这可用于获取培训数据。 第三, 我们提出了改进的超分辨率网络(iSRNet) 。 与Conv- TasNet和多声调混合构建战略联合培训, 拟议的iSRNet实现了与MedleyVox的直位和不相匹配的时频面具的类似性功能。 音样样本、 数据集和代码可以在我们的GithHubbb网页上查阅 (http:// hashbub.Mjebin/ hangcbin) 。 (http.