Speech enhancement attenuates interfering sounds in speech signals but may introduce artifacts that perceivably deteriorate the output signal. We propose a method for controlling the trade-off between the attenuation of the interfering background signal and the loss of sound quality. A deep neural network estimates the attenuation of the separated background signal such that the sound quality, quantified using the Artifact-related Perceptual Score, meets an adjustable target. Subjective evaluations indicate that consistent sound quality is obtained across various input signals. Our experiments show that the proposed method is able to control the trade-off with an accuracy that is adequate for real-world dialogue enhancement applications.
翻译:语音增强可以减少语音信号中的干扰声音,但可能会引入可能使输出信号恶化的文物。我们提出了一种方法来控制干扰背景信号减缩与声音质量损失之间的权衡。一个深神经网络估计分离背景信号的减缩,以使使用人工智能相关感知分数量化的音质达到一个可调整的目标。主观评价表明,各种输入信号都获得了一致的稳妥质量。我们的实验显示,拟议方法能够以准确性控制交易,从而足以实现真实世界的对话增强应用。