卡拉OK内容生成的歌手分离 (Singer separation for karaoke content generation)

Due to the rapid development of deep learning, we can now successfully separate singing voice from mono audio music. However, this separation can only extract human voices from other musical instruments, which is undesirable for karaoke content generation applications that only require the separation of lead singers. For this karaoke application, we need to separate the music containing male and female duets into two vocals, or extract a single lead vocal from the music containing vocal harmony. For this reason, we propose in this article to use a singer separation system, which generates karaoke content for one or two separated lead singers. In particular, we introduced three models for the singer separation task and designed an automatic model selection scheme to distinguish how many lead singers are in the song. We also collected a large enough data set, MIR-SingerSeparation, which has been publicly released to advance the frontier of this research. Our singer separation is most suitable for sentimental ballads and can be directly applied to karaoke content generation. As far as we know, this is the first singer-separation work for real-world karaoke applications.

翻译：由于深层学习的迅速发展,我们现在可以成功地将歌声与单声音乐分离。但是,这种分离只能从其他音乐器械中提取人类声音, 这对卡拉OK内容生成应用程序来说是不可取的, 只需要将主唱分开。对于这个卡拉OK应用程序, 我们需要将包含男女二重奏的音乐分为两个音响, 或者从含有声乐的音乐中抽出一个包含声音的独奏声音。因此, 我们在本篇文章中建议使用一个歌手分离系统, 为一两个分开的主唱产生卡拉OK内容。特别是, 我们为歌手分离任务引入了三个模型, 并设计了一个自动模式选择计划, 以区分歌曲中有多少名主唱。我们还收集了一个足够大的数据集, MIR- Singer- 分隔, 已经公开发布该数据集是为了推进这项研究的前沿。我们的歌手分离最适合感情调的球, 并且可以直接应用于卡拉OK内容生成。据我们所知, 这是首个歌手分离工作, 用于真实世界卡拉OK应用程序。