VoooFixer:争取用神经导音器全面恢复语音 (VoiceFixer: Toward General Speech Restoration With Neural Vocoder)

Speech restoration aims to remove distortions in speech signals. Prior methods mainly focus on single-task speech restoration (SSR), such as speech denoising or speech declipping. However, SSR systems only focus on one task and do not address the general speech restoration problem. In addition, previous SSR systems show limited performance in some speech restoration tasks such as speech super-resolution. To overcome those limitations, we propose a general speech restoration (GSR) task that attempts to remove multiple distortions simultaneously. Furthermore, we propose VoiceFixer, a generative framework to address the GSR task. VoiceFixer consists of an analysis stage and a synthesis stage to mimic the speech analysis and comprehension of the human auditory system. We employ a ResUNet to model the analysis stage and a neural vocoder to model the synthesis stage. We evaluate VoiceFixer with additive noise, room reverberation, low-resolution, and clipping distortions. Our baseline GSR model achieves a 0.499 higher mean opinion score (MOS) than the speech enhancement SSR model. VoiceFixer further surpasses the GSR baseline model on the MOS score by 0.256. Moreover, we observe that VoiceFixer generalizes well to severely degraded real speech recordings, indicating its potential in restoring old movies and historical speeches. The source code is available at https://github.com/haoheliu/voicefixer_main.

翻译：恢复语音的目的是消除语言信号中的扭曲现象; 先前的方法主要侧重于单一任务语言恢复(SSR),如语言淡化或言语淡化等。然而,安全部门改革系统只侧重于一个任务,而没有解决一般性语言恢复问题。此外,以前的安全部门改革系统在一些语言恢复任务中表现有限,例如超分辨率语言恢复任务。为了克服这些限制,我们提议了一个一般性的恢复语音任务,以同时消除多重扭曲现象。此外,我们提议了语音Fixer(GSR),这是一个处理GSR任务的基因化框架。语音Fixer是一个分析阶段和一个合成阶段,以模拟语音分析和理解人类听力系统。我们使用ResUNet来模拟分析阶段,而神经电解码器则模拟合成阶段。我们用添加噪音、房间反响、低分辨率和剪裁剪裁来评估语音恢复器。我们的基线GSRM模型比加强语音系统改革模型的平均值高0.499。 VoiceF进一步超越了对MOS系统语音分析和理解的基底基模型,用0.256来严重恢复历史记录。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

生成对抗网络GAN在各领域应用研究进展(中文版)，37页pdf

专知会员服务

151+阅读 · 2020年12月30日

【中科院】命名实体识别技术综述

专知会员服务

157+阅读 · 2020年4月21日

专知会员服务

15+阅读 · 2019年11月24日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019