Speech enhancement improves speech quality and promotes the performance of various downstream tasks. However, most current speech enhancement work was mainly devoted to improving the performance of downstream automatic speech recognition (ASR), only a relatively small amount of work focused on the automatic speaker verification (ASV) task. In this work, we propose a MVNet consisted of a memory assistance module which improves the performance of downstream ASR and a vocal reinforcement module which boosts the performance of ASV. In addition, we design a new loss function to improve speaker vocal similarity. Experimental results on the Libri2mix dataset show that our method outperforms baseline methods in several metrics, including speech quality, intelligibility, and speaker vocal similarity et al.
翻译:提高语言能力可以提高语言质量,促进履行各种下游任务;然而,目前大多数语言能力增强工作主要用于改进下游自动语音识别(ASR)的绩效,只是相对较少的侧重于自动语音验证(ASV)任务的工作;在这项工作中,我们提议MVNet由一个记忆协助模块组成,该模块可以改进下游语言服务的业绩,另一个声音强化模块可以提高ASV的性能;此外,我们设计一个新的损失功能来改进语音相似性。 Libri2mix数据集的实验结果表明,我们的方法在包括语言质量、智能和声音相似性等若干指标方面超过了基线方法。