使用环周期中转GAN进行非平行语音增强使用联合量估测和阶段恢复 (Joint magnitude estimation and phase recovery using Cyle-in-cycle GAN for non-parallel speech enhancement) - 专知论文

会员服务 ·

0

估计/估计量 · 语音增强 · CycleGAN · INFORMS · Pair ·

2021 年 9 月 26 日

Joint magnitude estimation and phase recovery using Cyle-in-cycle GAN for non-parallel speech enhancement

翻译：使用环周期中转GAN进行非平行语音增强使用联合量估测和阶段恢复

Guochen Yu,Andong Li,Yutian Wang,Yinuo Guo,Chengshi Zheng,Hui Wang

from arxiv, Submitted to ICASSP 2022 (5 pages)

For the lack of adequate paired noisy-clean speech corpus in many real scenarios, non-parallel training is a promising task for DNN-based speech enhancement methods. However, because of the severe mismatch between input and target speech, many previous studies only focus on magnitude spectrum estimation and remain the phase unaltered, resulting in the degraded speech quality under low signal-to-noise ratio conditions. To tackle this problem, we decouple the difficult target $\emph{w.r.t.}$ original spectrum optimization into spectral magnitude and phase, and propose a novel Cycle-in-cycle generative adversarial network (dubbed CinCGAN) to jointly estimate the spectral magnitude and phase information stage by stage. In the first stage, we pretrain a magnitude CycleGAN to coarsely denoise the spectral magnitude spectrum. In the second stage, we couple the pretrained CycleGAN with a complex-valued CycleGAN as a cycle-in-cycle structure to recover phase information and refine the spectral magnitude simultaneously. The experimental results on the VoiceBank + Demand show that the proposed approach significantly outperforms previous baselines under non-parallel training. Experiments on training the models with standard paired data also show that the proposed method can achieve remarkable performance.

翻译：对于在许多现实情景中缺乏适当的对称噪音清洁言语保护,非平行培训对于基于 DNN 的语音强化方法来说是一项很有希望的任务。然而,由于投入与目标演讲之间严重不匹配,许多先前的研究只侧重于数量频谱估计,并且仍然没有改变阶段,导致信号到噪音比率低的情况下语言质量下降。为了解决这一问题,我们将最初的频谱优化到光谱级和阶段这一困难目标($emph{w.r.t.})分解为原始频谱优化,并提议建立一个全新的循环基因对抗网络(dubbed CinCGAN),以便按阶段联合估计光谱级和阶段信息阶段。在第一阶段,我们预设了一个规模的循环GAN,以粗略地淡化光谱级频谱频谱频谱频谱频谱频谱频谱谱。在第二阶段,我们把经过预先训练的CyleGAN与价值复杂的循环GAN相匹配成循环结构,以恢复阶段信息并同时改进光谱级。VeopBank+需求实验结果显示,拟议的方法也明显超越了先前的实验性模型。

0

相关内容

估计/估计量

估计/估计量

基于深度学习的行人检测方法综述

基于深度学习的行人检测方法综述

专知会员服务

71+阅读 · 2021年4月14日

【CVPR2021】神经结构搜索的相对论性评价

专知会员服务

12+阅读 · 2021年3月25日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

【学界】DeepMind论文：深度压缩感知，新框架提升GAN性能

【学界】DeepMind论文：深度压缩感知，新框架提升GAN性能

GAN生成式对抗网络

14+阅读 · 2019年5月23日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

carla 体验效果及代码

carla 体验效果及代码

CreateAMind

7+阅读 · 2018年2月3日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Unsupervised Speech Enhancement with speech recognition embedding and disentanglement losses

Arxiv

0+阅读 · 2021年11月16日

Data Augmentation for Speech Recognition in Maltese: A Low-Resource Perspective

Arxiv

0+阅读 · 2021年11月15日

Joint Far- and Near-End Speech Intelligibility Enhancement based on the Approximated Speech Intelligibility Index

Arxiv

0+阅读 · 2021年11月15日

Direct Noisy Speech Modeling for Noisy-to-Noisy Voice Conversion

Arxiv

0+阅读 · 2021年11月13日

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Arxiv

3+阅读 · 2020年6月9日

AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment

AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment

Arxiv

3+阅读 · 2020年3月4日

Phase-aware Speech Enhancement with Deep Complex U-Net

Phase-aware Speech Enhancement with Deep Complex U-Net

Arxiv

15+阅读 · 2019年3月7日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

Arxiv

5+阅读 · 2018年6月4日

Speech waveform synthesis from MFCC sequences with generative adversarial networks

Arxiv

5+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

基于深度学习的行人检测方法综述

基于深度学习的行人检测方法综述

专知会员服务

71+阅读 · 2021年4月14日

【CVPR2021】神经结构搜索的相对论性评价

专知会员服务

12+阅读 · 2021年3月25日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【DeepMind】强化学习教程，83页ppt

【DeepMind】强化学习教程，83页ppt

专知会员服务

158+阅读 · 2020年8月7日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《商用大语言模型的升级风险管理：国家安全运用》

【伯克利博士论文】通过真实世界实践赋能机器人自主性

《从装备到文化：美陆军技术素养建设启示录》最新报告

人工智能安全治理白皮书（2025）

相关资讯

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

【学界】DeepMind论文：深度压缩感知，新框架提升GAN性能

【学界】DeepMind论文：深度压缩感知，新框架提升GAN性能

GAN生成式对抗网络

14+阅读 · 2019年5月23日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

carla 体验效果及代码

carla 体验效果及代码

CreateAMind

7+阅读 · 2018年2月3日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Unsupervised Speech Enhancement with speech recognition embedding and disentanglement losses

Arxiv

0+阅读 · 2021年11月16日

Data Augmentation for Speech Recognition in Maltese: A Low-Resource Perspective

Arxiv

0+阅读 · 2021年11月15日

Joint Far- and Near-End Speech Intelligibility Enhancement based on the Approximated Speech Intelligibility Index

Arxiv

0+阅读 · 2021年11月15日

Direct Noisy Speech Modeling for Noisy-to-Noisy Voice Conversion

Arxiv

0+阅读 · 2021年11月13日

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Arxiv

3+阅读 · 2020年6月9日

AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment

AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment

Arxiv

3+阅读 · 2020年3月4日

Phase-aware Speech Enhancement with Deep Complex U-Net

Phase-aware Speech Enhancement with Deep Complex U-Net

Arxiv

15+阅读 · 2019年3月7日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

Arxiv

5+阅读 · 2018年6月4日

Speech waveform synthesis from MFCC sequences with generative adversarial networks

Arxiv

5+阅读 · 2018年4月3日

微信扫码咨询专知VIP会员