改善情感语音转换StarGAN:提高声音质量和数据增加 (An Improved StarGAN for Emotional Voice Conversion: Enhancing Voice Quality and Data Augmentation) - 专知论文

会员服务 ·

0

数据增强 · INFORMS · MoDELS · 可约的 · 微F1 ·

2021 年 7 月 18 日

An Improved StarGAN for Emotional Voice Conversion: Enhancing Voice Quality and Data Augmentation

翻译：改善情感语音转换StarGAN:提高声音质量和数据增加

Xiangheng He,Junjie Chen,Georgios Rizos,Björn W. Schuller

from arxiv, Accepted by Interspeech 2021

Emotional Voice Conversion (EVC) aims to convert the emotional style of a source speech signal to a target style while preserving its content and speaker identity information. Previous emotional conversion studies do not disentangle emotional information from emotion-independent information that should be preserved, thus transforming it all in a monolithic manner and generating audio of low quality, with linguistic distortions. To address this distortion problem, we propose a novel StarGAN framework along with a two-stage training process that separates emotional features from those independent of emotion by using an autoencoder with two encoders as the generator of the Generative Adversarial Network (GAN). The proposed model achieves favourable results in both the objective evaluation and the subjective evaluation in terms of distortion, which reveals that the proposed model can effectively reduce distortion. Furthermore, in data augmentation experiments for end-to-end speech emotion recognition, the proposed StarGAN model achieves an increase of 2% in Micro-F1 and 5% in Macro-F1 compared to the baseline StarGAN model, which indicates that the proposed model is more valuable for data augmentation.

翻译：情感变换(EVC)旨在将源语言信号的情感风格转换成目标风格,同时保留其内容和发言者身份信息。以前的情感变换研究不会将情感信息与应保存的情感独立信息分离,从而以单一的方式将其全部转换,产生低质量的音频,并造成语言扭曲。为了解决这一扭曲问题,我们提议了一个新型StarGAN框架,同时提出一个两阶段培训进程,将情感特征与情感独立的情感区分开来,方法是使用一个自动编码器,用两个编码器作为基因反转网络的生成者。提议的模型在客观评价和扭曲方面的主观评价两方面都取得了有利的结果,这表明拟议的模型能够有效减少扭曲。此外,在终端对终端语音识别的数据增强实验中,拟议的StarGAN模型与StarGAN模型相比增加了2%的微-F1和宏观-F1的5%,这表明拟议的模型对数据增扩增更有价值。

0

相关内容

数据增强

数据增强在机器学习领域多指采用一些方法（比如数据蒸馏，正负样本均衡等）来提高模型数据集的质量，增强数据。

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【CVPR2020-Oral】无监督域内自适应语义分割，Unsupervised Intra-domain Adaptation

【CVPR2020-Oral】无监督域内自适应语义分割，Unsupervised Intra-domain Adaptation

专知会员服务

71+阅读 · 2020年4月20日

【CVPR2020】MSG-GAN:用于稳定图像合成的多尺度梯度GAN

【CVPR2020】MSG-GAN:用于稳定图像合成的多尺度梯度GAN

专知会员服务

29+阅读 · 2020年4月6日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

基于深度学习的图像语义分割技术研究进展，Research on Progress of Image Semantic Segmentation Based on Deep Learning

基于深度学习的图像语义分割技术研究进展，Research on Progress of Image Semantic Segmentation Based on Deep Learning

专知会员服务

64+阅读 · 2020年2月16日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

人工智能乳房x线照相术和数字化乳房人工合成:当前的概念和未来的展望综述论文

人工智能乳房x线照相术和数字化乳房人工合成:当前的概念和未来的展望综述论文

专知会员服务

5+阅读 · 2019年9月25日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

PTGAN for Person Re-Identification

PTGAN for Person Re-Identification

统计学习与视觉计算组

4+阅读 · 2018年9月10日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Jointly Improving Summarization and Sentiment Classification

Jointly Improving Summarization and Sentiment Classification

黑龙江大学自然语言处理实验室

3+阅读 · 2018年6月12日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

stackGAN通过文字描述生成图片的V2项目

stackGAN通过文字描述生成图片的V2项目

CreateAMind

3+阅读 · 2018年1月1日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

GAN猫的脸

机械鸡

11+阅读 · 2017年7月8日

Translatotron 2: Robust direct speech-to-speech translation

Arxiv

0+阅读 · 2021年9月19日

PDAugment: Data Augmentation by Pitch and Duration Adjustments for Automatic Lyrics Transcription

PDAugment: Data Augmentation by Pitch and Duration Adjustments for Automatic Lyrics Transcription

Arxiv

0+阅读 · 2021年9月16日

Unsupervised Domain Adaptation for Semantic Segmentation by Content Transfer

Arxiv

4+阅读 · 2020年12月23日

Exploiting Diverse Characteristics and Adversarial Ambivalence for Domain Adaptive Segmentation

Exploiting Diverse Characteristics and Adversarial Ambivalence for Domain Adaptive Segmentation

Arxiv

9+阅读 · 2020年12月10日

In-Domain GAN Inversion for Real Image Editing

Arxiv

3+阅读 · 2020年7月16日

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Arxiv

7+阅读 · 2019年4月18日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Multi-turn Dialogue Response Generation in an Adversarial Learning Framework

Arxiv

4+阅读 · 2018年6月11日

High-quality nonparallel voice conversion based on cycle-consistent adversarial network

Arxiv

4+阅读 · 2018年4月2日

An Improved Evaluation Framework for Generative Adversarial Networks

Arxiv

3+阅读 · 2018年3月27日

VIP会员

文章信息

相关主题

相关VIP内容

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【CVPR2020-Oral】无监督域内自适应语义分割，Unsupervised Intra-domain Adaptation

【CVPR2020-Oral】无监督域内自适应语义分割，Unsupervised Intra-domain Adaptation

专知会员服务

71+阅读 · 2020年4月20日

【CVPR2020】MSG-GAN:用于稳定图像合成的多尺度梯度GAN

【CVPR2020】MSG-GAN:用于稳定图像合成的多尺度梯度GAN

专知会员服务

29+阅读 · 2020年4月6日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

基于深度学习的图像语义分割技术研究进展，Research on Progress of Image Semantic Segmentation Based on Deep Learning

基于深度学习的图像语义分割技术研究进展，Research on Progress of Image Semantic Segmentation Based on Deep Learning

专知会员服务

64+阅读 · 2020年2月16日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

人工智能乳房x线照相术和数字化乳房人工合成:当前的概念和未来的展望综述论文

人工智能乳房x线照相术和数字化乳房人工合成:当前的概念和未来的展望综述论文

专知会员服务

5+阅读 · 2019年9月25日

热门VIP内容

开通专知VIP会员享更多权益服务

《步兵小单元山地严寒作战指南》美军最新条令200页

《联合作战概念的发展》最新报告

俄制无人机弹药

《复杂场景下自主着陆的模型预测控制技术》92页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

PTGAN for Person Re-Identification

PTGAN for Person Re-Identification

统计学习与视觉计算组

4+阅读 · 2018年9月10日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Jointly Improving Summarization and Sentiment Classification

Jointly Improving Summarization and Sentiment Classification

黑龙江大学自然语言处理实验室

3+阅读 · 2018年6月12日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

stackGAN通过文字描述生成图片的V2项目

stackGAN通过文字描述生成图片的V2项目

CreateAMind

3+阅读 · 2018年1月1日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

GAN猫的脸

机械鸡

11+阅读 · 2017年7月8日

相关论文

Translatotron 2: Robust direct speech-to-speech translation

Arxiv

0+阅读 · 2021年9月19日

PDAugment: Data Augmentation by Pitch and Duration Adjustments for Automatic Lyrics Transcription

PDAugment: Data Augmentation by Pitch and Duration Adjustments for Automatic Lyrics Transcription

Arxiv

0+阅读 · 2021年9月16日

Unsupervised Domain Adaptation for Semantic Segmentation by Content Transfer

Arxiv

4+阅读 · 2020年12月23日

Exploiting Diverse Characteristics and Adversarial Ambivalence for Domain Adaptive Segmentation

Exploiting Diverse Characteristics and Adversarial Ambivalence for Domain Adaptive Segmentation

Arxiv

9+阅读 · 2020年12月10日

In-Domain GAN Inversion for Real Image Editing

Arxiv

3+阅读 · 2020年7月16日

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Arxiv

7+阅读 · 2019年4月18日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

Multi-turn Dialogue Response Generation in an Adversarial Learning Framework

Arxiv

4+阅读 · 2018年6月11日

High-quality nonparallel voice conversion based on cycle-consistent adversarial network

Arxiv

4+阅读 · 2018年4月2日

An Improved Evaluation Framework for Generative Adversarial Networks

Arxiv

3+阅读 · 2018年3月27日

微信扫码咨询专知VIP会员