Speech enhancement is an essential task of improving speech quality in noise scenario. Several state-of-the-art approaches have introduced visual information for speech enhancement,since the visual aspect of speech is essentially unaffected by acoustic environment. This paper proposes a novel frameworkthat involves visual information for speech enhancement, by in-corporating a Generative Adversarial Network (GAN). In par-ticular, the proposed visual speech enhancement GAN consistof two networks trained in adversarial manner, i) a generator that adopts multi-layer feature fusion convolution network to enhance input noisy speech, and ii) a discriminator that attemptsto minimize the discrepancy between the distributions of the clean speech signal and enhanced speech signal. Experiment re-sults demonstrated superior performance of the proposed modelagainst several state-of-the-art
翻译:在噪音场景中,提高语言语言质量是一项基本任务。由于语言的视觉方面基本上不受声学环境的影响,一些最先进的方法引进了视觉信息来强化语言,因为语言的视觉方面基本上不受声学环境的影响。本文件提出一个新的框架,其中涉及视觉信息,通过内装一个创性反言网络(GAN)来强化语言。 表面上,拟议的视觉语音增强GAN由两个经过对抗性培训的网络组成,一是采用多层特征融合网络以加强输入噪音的发电机,二是试图尽量减少清洁言语信号和强化言语信号分布之间差异的歧视者。 实验性再生显示,拟议模式优异地打击了几个最先进的语言。