Remixing separated audio sources trades off interferer attenuation against the amount of audible deteriorations. This paper proposes a non-intrusive audio quality estimation method for controlling this trade-off in a signal-adaptive manner. The recently proposed 2f-model is adopted as the underlying quality measure, since it has been shown to correlate strongly with basic audio quality in source separation. An alternative operation mode of the measure is proposed, more appropriate when considering material with long inactive periods of the target source. The 2f-model requires the reference target source as an input, but this is not available in many applications. Deep neural networks (DNNs) are trained to estimate the 2f-model intrusively using the reference target (iDNN2f), non-intrusively using the input mix as reference (nDNN2f), and reference-free using only the separated output signal (rDNN2f). It is shown that iDNN2f achieves very strong correlation with the original measure on the test data (Pearson r=0.99), while performance decreases for nDNN2f (r>=0.91) and rDNN2f (r>=0.82). The non-intrusive estimate nDNN2f is mapped to select item-dependent remixing gains with the aim of maximizing the interferer attenuation under a constraint on the minimum quality of the remixed output (e.g., audible but not annoying deteriorations). A listening test shows that this is successfully achieved even with very different selected gains (up to 23 dB difference).
翻译:调整分离的音频源时, 将干扰器的衰减与听觉变坏的数量进行交换。 本文建议采用非侵入性音频质量估计方法, 以信号适应方式控制这种交换。 最近提议的 2f 模型被作为基本质量衡量标准, 因为已经显示它与源分离的基本音频质量密切相关。 在考虑目标源长期不活动期间的材料时, 提出该措施的替代操作模式更为合适。 2f 模型需要参考目标源作为输入, 但许多应用程序中都找不到这个源。 深神经网络( DNNNN) 受过培训, 使用参考目标( iDNN2f) 来对2f 模型进行侵扰动性评估( iDNN2f) 模型, 仅使用分离输出信号( rDNN2fff) 。 显示与测试数据( Pearson r=0. 99) 的原始测量非常强烈的关联性, 而对于 NNN2 和 IM 目标的最小性测试值则显示在 NN_r=x 的最小性测试值下, 。