The discrepancy between the cost function used for training a speech enhancement model and human auditory perception usually makes the quality of enhanced speech unsatisfactory. Objective evaluation metrics which consider human perception can hence serve as a bridge to reduce the gap. Our previously proposed MetricGAN was designed to optimize objective metrics by connecting the metric with a discriminator. Because only the scores of the target evaluation functions are needed during training, the metrics can even be non-differentiable. In this study, we propose a MetricGAN+ in which three training techniques incorporating domain-knowledge of speech processing are proposed. With these techniques, experimental results on the VoiceBank-DEMAND dataset show that MetricGAN+ can increase PESQ score by 0.3 compared to the previous MetricGAN and achieve state-of-the-art results (PESQ score = 3.15).
翻译:用于培训增强语言能力模型的成本功能与人听觉认知之间的差别通常使强化语言质量不尽人意。客观评价指标认为人的看法可以因此作为缩小差距的桥梁。我们先前提议的MetriGAN设计的目的是通过将衡量标准与歧视者联系起来,优化客观衡量标准。由于在培训期间只需要目标评价职能的分数,因此这些衡量标准甚至可能是无差别的。在本研究中,我们建议采用MetriGAN+,其中提出三种包含语音处理域知识的培训技术。利用这些技术,语音银行-DEMAND数据集的实验结果表明,MetriGAN+可以将PESQ的得分比以前的MetriGAN增加0.3分,并实现最新结果(PESQ评分=3.15)。