The performance of music source separation (MSS) models has been greatly improved in recent years thanks to the development of novel neural network architectures and training pipelines. However, recent model designs for MSS were mainly motivated by other audio processing tasks or other research fields, while the intrinsic characteristics and patterns of the music signals were not fully discovered. In this paper, we propose band-split RNN (BSRNN), a frequency-domain model that explictly splits the spectrogram of the mixture into subbands and perform interleaved band-level and sequence-level modeling. The choices of the bandwidths of the subbands can be determined by a priori knowledge or expert knowledge on the characteristics of the target source in order to optimize the performance on a certain type of target musical instrument. To better make use of unlabeled data, we also describe a semi-supervised model finetuning pipeline that can further improve the performance of the model. Experiment results show that BSRNN trained only on MUSDB18-HQ dataset significantly outperforms several top-ranking models in Music Demixing (MDX) Challenge 2021, and the semi-supervised finetuning stage further improves the performance on all four instrument tracks.
翻译:近年来,由于开发了新型神经网络结构和培训管道,音乐源分离模型的性能得到极大改善,近年来,由于开发了新的神经网络结构和培训管道,音乐源分离模型的性能得到极大改善,但是,最近音乐源模型的设计主要是由其他音处理任务或其他研究领域推动的,而音乐信号的内在特点和模式尚未完全发现。在本论文中,我们提议了将混合物的光谱分解成子带并进行间断带级和序列级建模的频度模型,即频谱-域模型,即将混合物的光谱分解成分光谱的RNNNN(BSRN)(BSRN)(BSRN)的频谱和频带带带带带带带带带带带带带宽度的选择,可以通过对目标源特性的先天性知识或专家知识来确定,以便优化某类目标乐器的性能。为了更好地利用未加标签的数据,我们还描述了一种半受监督的微模管线模型,可以进一步改善模型的性能。实验结果表明,BNNE只对MDEB18-HQ数据集进行了显著超越了MDEDIS系统升级2021所有四轨道的改进的半调整。