Choral music separation refers to the task of extracting tracks of voice parts (e.g., soprano, alto, tenor, and bass) from mixed audio. The lack of datasets has impeded research on this topic as previous work has only been able to train and evaluate models on a few minutes of choral music data due to copyright issues and dataset collection difficulties. In this paper, we investigate the use of synthesized training data for the source separation task on real choral music. We make three contributions: first, we provide an automated pipeline for synthesizing choral music data from sampled instrument plugins within controllable options for instrument expressiveness. This produces an 8.2-hour-long choral music dataset from the JSB Chorales Dataset and one can easily synthesize additional data. Second, we conduct an experiment to evaluate multiple separation models on available choral music separation datasets from previous work. To the best of our knowledge, this is the first experiment to comprehensively evaluate choral music separation. Third, experiments demonstrate that the synthesized choral data is of sufficient quality to improve the model's performance on real choral music datasets. This provides additional experimental statistics and data support for the choral music separation study.
翻译:声乐分离是指从混合音频中提取音频部件的轨迹(如高音、高音、高音、低音和低音)的任务。缺乏数据集阻碍了对这一专题的研究,因为先前的工作只能培训和评价关于由于版权问题和数据收集困难而导致的几分钟的彩色音乐数据的模型。在本文件中,我们调查了在真实声乐中源分离任务中使用综合培训数据的情况。我们做出了三项贡献:首先,我们为在可控仪器表达性选项内将抽样仪器插件的声乐数据合成而提供了自动管道。这产生了来自JSB Chorales Data的8.2小时长的声乐数据集,而其中一人可以很容易地综合补充更多的数据。第二,我们进行了一项实验,以评价关于现有声乐分离数据集的多种分离模型。我们最了解的是,这是对声乐分离进行全面评估的第一次实验。第三,实验表明合成声乐数据的质量足以改进模型在真实音乐分离方面的性能。本实验性数据提供了额外数据支持。