VoooFixer:争取用神经导音器全面恢复语音 (VoiceFixer: Toward General Speech Restoration With Neural Vocoder) - 专知论文

会员服务 ·

0

语音增强 · MoDELS · Performer · 得分 · 基准 ·

2021 年 9 月 28 日

VoiceFixer: Toward General Speech Restoration With Neural Vocoder

翻译：VoooFixer:争取用神经导音器全面恢复语音

Haohe Liu,Qiuqiang Kong,Qiao Tian,Yan Zhao,DeLiang Wang,Chuanzeng Huang,Yuxuan Wang

Speech restoration aims to remove distortions in speech signals. Prior methods mainly focus on single-task speech restoration(SSR), such as speech enhancement or speech declipping. However, SSR systems only focus on one task and do not address the general speech restoration problem. Previous SSR systems also have limited performance in speech restoration tasks such as speech super-resolution. To overcome those limitations, we propose a general speech restoration(GSR) task that attempts to remove multiple distortions simultaneously. Furthermore, we propose VoiceFixer, a generative framework to address the GSR tasks. VoiceFixer consists of an analysis stage and a synthesis stage to mimic the speech analysis and comprehension of the human auditory system. We employ a ResUNet to model the analysis module and a neural vocoder to model the synthesis module. We evaluate VoiceFixer with additive noise, room reverberation, low-resolution, and clipping distortions. Our baseline GSR model achieves a 0.499 higher mean opinion score(MOS) than the speech enhancement SSR model. VoiceFixer further surpasses the GSR baseline model on the MOS score by 0.256. In addition, we observe that VoiceFixer generalizes well to severely degraded real speech recordings, indicating its potential in restoring old movies and historical speeches. The source code is available at https://github.com/haoheliu/voicefixer_main.

翻译：恢复语音的目的是消除语音信号中的扭曲现象; 先前的方法主要侧重于单一任务语言恢复(SSR),如语音增强或语音解析等。然而, SSR系统仅侧重于一个任务,而没有解决一般语音恢复问题。以前的SSR系统在语音恢复任务(如超分辨率)中的表现也有限。为了克服这些限制,我们提议了一般性语音恢复(GSR)任务,以同时消除多重扭曲现象。此外,我们提议了语音恢复(GSR)任务,这是一个处理GSR任务的发源框架。语音Fixer是一个分析阶段和一个合成阶段,以模拟语音分析和理解人类听力系统。我们使用 ResUNet来模拟分析模块,而神经电算器来模拟合成模块。我们用添加噪音、房间回动、低分辨率和剪动扭曲来评估语音恢复器。我们的基线GSR模型比加强语音改革模型的平均值高出0.499。语音Fix进一步超越了MOS的GSR基线模型, 以0.256/ helmasium 进行模拟。此外,我们观察了老式的系统代码源代码, 正在重化。

0

相关内容

语音增强

语音增强是指当语音信号被各种各样的噪声干扰、甚至淹没后，从噪声背景中提取有用的语音信号，抑制、降低噪声干扰的技术。一句话，从含噪语音中提取尽可能纯净的原始语音。

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

【ICCV 2019 Workshop】Adaptive Confidence Smoothing for Generalized Zero-Shot Learning，巴伊兰大学 Yuval Atzmon

【ICCV 2019 Workshop】Adaptive Confidence Smoothing for Generalized Zero-Shot Learning，巴伊兰大学 Yuval Atzmon

专知会员服务

13+阅读 · 2019年10月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

计算机视觉最佳实践、代码示例和相关文档

计算机视觉最佳实践、代码示例和相关文档

专知会员服务

20+阅读 · 2019年10月9日

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

【Github】All4NLP：自然语言处理相关资源整理

【Github】All4NLP：自然语言处理相关资源整理

AINLP

23+阅读 · 2019年8月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

详解GAN的谱归一化（Spectral Normalization）

详解GAN的谱归一化（Spectral Normalization）

PaperWeekly

11+阅读 · 2019年2月13日

2018机器学习开源资源盘点

2018机器学习开源资源盘点

专知

6+阅读 · 2019年2月2日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021

Arxiv

0+阅读 · 2021年11月19日

A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation

Arxiv

0+阅读 · 2021年11月18日

A Trainable Spectral-Spatial Sparse Coding Model for Hyperspectral Image Restoration

A Trainable Spectral-Spatial Sparse Coding Model for Hyperspectral Image Restoration

Arxiv

0+阅读 · 2021年11月18日

BLOOM-Net: Blockwise Optimization for Masking Networks Toward Scalable and Efficient Speech Enhancement

Arxiv

0+阅读 · 2021年11月17日

High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency

Arxiv

0+阅读 · 2021年11月17日

A Survey on Neural Speech Synthesis

Arxiv

14+阅读 · 2021年6月30日

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

Arxiv

7+阅读 · 2019年10月8日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

Arxiv

6+阅读 · 2018年9月17日

Self-Attention Generative Adversarial Networks

Arxiv

8+阅读 · 2018年5月21日

VIP会员

文章信息

相关主题

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

【ICCV 2019 Workshop】Adaptive Confidence Smoothing for Generalized Zero-Shot Learning，巴伊兰大学 Yuval Atzmon

【ICCV 2019 Workshop】Adaptive Confidence Smoothing for Generalized Zero-Shot Learning，巴伊兰大学 Yuval Atzmon

专知会员服务

13+阅读 · 2019年10月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

计算机视觉最佳实践、代码示例和相关文档

计算机视觉最佳实践、代码示例和相关文档

专知会员服务

20+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

Apium加入红猫未来计划：推进战术无人机集群自主技术

《美陆军训练条令：反小型无人机系统（C-sUAS）炮术项目》2025最新80页

无人机如何改变战争？未来战场

《超大城市作战艺术》52页报告

相关资讯

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

【Github】All4NLP：自然语言处理相关资源整理

【Github】All4NLP：自然语言处理相关资源整理

AINLP

23+阅读 · 2019年8月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

详解GAN的谱归一化（Spectral Normalization）

详解GAN的谱归一化（Spectral Normalization）

PaperWeekly

11+阅读 · 2019年2月13日

2018机器学习开源资源盘点

2018机器学习开源资源盘点

专知

6+阅读 · 2019年2月2日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

相关论文

DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021

Arxiv

0+阅读 · 2021年11月19日

A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation

Arxiv

0+阅读 · 2021年11月18日

A Trainable Spectral-Spatial Sparse Coding Model for Hyperspectral Image Restoration

A Trainable Spectral-Spatial Sparse Coding Model for Hyperspectral Image Restoration

Arxiv

0+阅读 · 2021年11月18日

BLOOM-Net: Blockwise Optimization for Masking Networks Toward Scalable and Efficient Speech Enhancement

Arxiv

0+阅读 · 2021年11月17日

High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency

Arxiv

0+阅读 · 2021年11月17日

A Survey on Neural Speech Synthesis

Arxiv

14+阅读 · 2021年6月30日

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

Arxiv

7+阅读 · 2019年10月8日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

Arxiv

6+阅读 · 2018年9月17日

Self-Attention Generative Adversarial Networks

Arxiv

8+阅读 · 2018年5月21日

微信扫码咨询专知VIP会员