Source separation models either work on the spectrogram or waveform domain. In this work, we show how to perform end-to-end hybrid source separation, letting the model decide which domain is best suited for each source, and even combining both. The proposed hybrid version of the Demucs architecture won the Music Demixing Challenge 2021 organized by Sony. This architecture also comes with additional improvements, such as compressed residual branches, local attention or singular value regularization. Overall, a 1.4 dB improvement of the Signal-To-Distortion (SDR) was observed across all sources as measured on the MusDB HQ dataset, an improvement confirmed by human subjective evaluation, with an overall quality rated at 2.83 out of 5 (2.36 for the non hybrid Demucs), and absence of contamination at 3.04 (against 2.37 for the non hybrid Demucs and 2.44 for the second ranking model submitted at the competition).
翻译:在这项工作中,我们展示了如何进行端至端混合源分离,让模型决定最适合每种源的域,甚至两者兼而有之。Demucs 结构的拟议混合版本赢得了Sony组织的音乐拆解挑战2021。这一结构还伴随着额外的改进,如压缩残余分支、当地关注或单值正规化。总体而言,在MusDB HQ数据集中测量的所有源中都观察到了信号到扭曲(SDR)的1.4 dB改进,这一改进得到了人类主观评价的确认,总体质量评分为5分中的2.83分(非混合设计为2.36分),在3.04分(非混合设计为2.37分,在竞争中提交的第二位模型为2.44分)。