Despite phenomenal progress in recent years, state-of-the-art music separation systems produce source estimates with significant perceptual shortcomings, such as adding extraneous noise or removing harmonics. We propose a post-processing model (the Make it Sound Good (MSG) post-processor) to enhance the output of music source separation systems. We apply our post-processing model to state-of-the-art waveform-based and spectrogram-based music source separators, including a separator unseen by MSG during training. Our analysis of the errors produced by source separators shows that waveform models tend to introduce more high-frequency noise, while spectrogram models tend to lose transients and high frequency content. We introduce objective measures to quantify both kinds of errors and show MSG improves the source reconstruction of both kinds of errors. Crowdsourced subjective evaluations demonstrate that human listeners prefer source estimates of bass and drums that have been post-processed by MSG.
翻译:尽管近年来取得了惊人的进展,但最先进的音乐分离系统产生的源估计数具有明显的认知缺陷,例如增加外来噪音或消除口音。我们提议了一个后处理模型(MSG),以提高音乐源分离系统的输出量。我们把后处理模型(MSG)应用到最先进的波形和光谱制音乐源分隔器,包括MSG在培训期间看不见的隔离器。我们对源分离器产生的错误的分析表明,波形模型往往引入更多的高频率噪音,而光谱模型往往失去中程和高频内容。我们采取客观措施,量化两种错误,并显示MSG改进了这两种错误的来源重建。人群的主观评价表明,人类听众更喜欢对低音和鼓的源估计,而后由MSG处理过。