用于ASR扩增的非Parallel语音转换 (Non-Parallel Voice Conversion for ASR Augmentation) - 专知论文

会员服务 ·

0

语音识别 · 稳健性 · 数据增强 · motivation · MoDELS ·

2022 年 9 月 15 日

Non-Parallel Voice Conversion for ASR Augmentation

翻译：用于ASR扩增的非Parallel语音转换

Gary Wang,Andrew Rosenberg,Bhuvana Ramabhadran,Fadi Biadsy,Yinghui Huang,Jesse Emond,Pedro Moreno Mengibar

from arxiv, Accepted by Interspeech 2022

Automatic speech recognition (ASR) needs to be robust to speaker differences. Voice Conversion (VC) modifies speaker characteristics of input speech. This is an attractive feature for ASR data augmentation. In this paper, we demonstrate that voice conversion can be used as a data augmentation technique to improve ASR performance, even on LibriSpeech, which contains 2,456 speakers. For ASR augmentation, it is necessary that the VC model be robust to a wide range of input speech. This motivates the use of a non-autoregressive, non-parallel VC model, and the use of a pretrained ASR encoder within the VC model. This work suggests that despite including many speakers, speaker diversity may remain a limitation to ASR quality. Finally, interrogation of our VC performance has provided useful metrics for objective evaluation of VC quality.

翻译：自动语音识别(ASR) 需要对发声器差异进行强力辨别。语音转换(VC) 改变输入式演讲的发音特征。这对增强 ASR 数据来说是一个有吸引力的特征。在本文中,我们证明语音转换可以作为一种数据增强技术,用于提高ASR的性能,即使是在LibriSpeech(LibriSpeech, 里面有2,456个发言者)。对于ASR 扩增, VC 模式必须对于广泛的输入式演讲具有强力性。这促使在VC 模式中使用非自动、非平行的 VC 模式,并使用预先培训过的ASR 编码器。这项工作表明,尽管有许多发言者,但发言者的多样性仍可能限制ASR 质量。最后,对我们的 VC 性能的探索为客观评估 VC 质量提供了有用的衡量标准。

0

相关内容

语音识别

语音识别是计算机科学和计算语言学的一个跨学科子领域，它发展了一些方法和技术，使计算机可以将口语识别和翻译成文本。它也被称为自动语音识别（ASR），计算机语音识别或语音转文本（STT）。它整合了计算机科学，语言学和计算机工程领域的知识和研究。

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

USP49 对FKBP5-AKT通路的调控在胰腺癌个性化医疗中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

ISL1在胚胎干细胞向心肌细胞分化过程中的表观遗传调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

非牛顿流磁流体动力学方程的数值方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

位移细分曲面的建模和编辑方法

国家自然科学基金

0+阅读 · 2009年12月31日

Legumain在乳腺癌骨转移和破骨损伤过程中的作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

Mixed Emotion Modelling for Emotional Voice Conversion

Arxiv

0+阅读 · 2022年10月25日

G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR

G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR

Arxiv

0+阅读 · 2022年10月24日

Low-Resource Multilingual and Zero-Shot Multispeaker TTS

Arxiv

0+阅读 · 2022年10月21日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

VIP会员

文章信息

相关主题

相关VIP内容

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【书籍】从零开始构建文本生成图像生成器：基于 Transformers 与扩散模型

人工智能与未来指挥

【伯克利博士论文】将大语言模型绑定至虚拟人格：实现人类行为模拟

稀疏自编码器综述：解释大语言模型的内部机制

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Mixed Emotion Modelling for Emotional Voice Conversion

Arxiv

0+阅读 · 2022年10月25日

G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR

G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR

Arxiv

0+阅读 · 2022年10月24日

Low-Resource Multilingual and Zero-Shot Multispeaker TTS

Arxiv

0+阅读 · 2022年10月21日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

相关基金

USP49 对FKBP5-AKT通路的调控在胰腺癌个性化医疗中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

ISL1在胚胎干细胞向心肌细胞分化过程中的表观遗传调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

非牛顿流磁流体动力学方程的数值方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

位移细分曲面的建模和编辑方法

国家自然科学基金

0+阅读 · 2009年12月31日

Legumain在乳腺癌骨转移和破骨损伤过程中的作用机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员