混合电话BERT:用混合电话和 Sup-Phoneme 语音发言稿表改进BERT (Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech) - 专知论文

会员服务 ·

0

音素 · BERT · 语音合成 · Learning · MoDELS ·

2022 年 6 月 29 日

Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech

翻译：混合电话BERT:用混合电话和 Sup-Phoneme 语音发言稿表改进BERT

Guangyan Zhang,Kaitao Song,Xu Tan,Daxin Tan,Yuzi Yan,Yanqing Liu,Gang Wang,Wei Zhou,Tao Qin,Tan Lee,Sheng Zhao

from arxiv, Accepted by interspeech 2022

Recently, leveraging BERT pre-training to improve the phoneme encoder in text to speech (TTS) has drawn increasing attention. However, the works apply pre-training with character-based units to enhance the TTS phoneme encoder, which is inconsistent with the TTS fine-tuning that takes phonemes as input. Pre-training only with phonemes as input can alleviate the input mismatch but lack the ability to model rich representations and semantic information due to limited phoneme vocabulary. In this paper, we propose MixedPhoneme BERT, a novel variant of the BERT model that uses mixed phoneme and sup-phoneme representations to enhance the learning capability. Specifically, we merge the adjacent phonemes into sup-phonemes and combine the phoneme sequence and the merged sup-phoneme sequence as the model input, which can enhance the model capacity to learn rich contextual representations. Experiment results demonstrate that our proposed Mixed-Phoneme BERT significantly improves the TTS performance with 0.30 CMOS gain compared with the FastSpeech 2 baseline. The Mixed-Phoneme BERT achieves 3x inference speedup and similar voice quality to the previous TTS pre-trained model PnG BERT

翻译：最近,为改进语音和语音文字中的语音编码器(TTS)而利用BERT的预培训来改进语音编码器(TTS)引起了越来越多的注意。然而,该工作采用了与字符基单位的预先培训,以加强TTS的电话编码器,这与TTS将手机编码器用作输入器的微调不符。只有将电话作为输入输入器的TTS微调才能进行预培训,只有将电话作为输入器进行微调才能减轻输入不匹配,但由于电话词汇有限,无法模拟丰富的表达和语义信息。在本文中,我们提议混合Phoneme BERT模式的一个新变型模式,即使用混合电话和超声频表示器来增强学习能力。具体地说,我们将相邻的电话编码器合并成Sup-phoneme,并将电话序列和合并的语音序列作为模型投入,这可以提高模型学习丰富的背景表述能力。实验结果表明,我们提议的混合-Phoneme BERTER的性能大大改进了TTS的性,与快速Speech 2基线相比,增加了0.30 CMOS。混合PERME模型达到前的快速质量,与前TERTUTERTUTS-CTUTSerging 3x。

0

相关内容

【斯坦福大学AI】BERT, ELMo， & GPT-2:上下文化的单词表示是怎样的?

专知会员服务

35+阅读 · 2020年3月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

BERT进展2019四篇必读论文

BERT进展2019四篇必读论文

专知会员服务

69+阅读 · 2020年1月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

视觉信息的局部特征表示及应用研究

国家自然科学基金

2+阅读 · 2015年12月31日

溶剂辅助激发态链式长程多质子转移反应动力学的理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

高效光纤型SERS探针的界面设计、可控制备及性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

Perp在类风湿性关节炎外周Th17细胞存活中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

汉藏双语个性化多语种语音合成中的语言建模的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于替换的实时Web服务事务处理

国家自然科学基金

0+阅读 · 2012年12月31日

框架的冗余度

国家自然科学基金

0+阅读 · 2012年12月31日

全树枝状磷光分子简单混合构筑发光层的白光有机电致发光器件

国家自然科学基金

0+阅读 · 2012年12月31日

生物活性导向的麦角碱的多样性合成

国家自然科学基金

0+阅读 · 2012年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

GIT: A Generative Image-to-text Transformer for Vision and Language

Arxiv

1+阅读 · 2022年8月22日

A Novel Multi-Task Learning Approach for Context-Sensitive Compound Type Identification in Sanskrit

A Novel Multi-Task Learning Approach for Context-Sensitive Compound Type Identification in Sanskrit

Arxiv

0+阅读 · 2022年8月22日

Linear regression and its inference on noisy network-linked data

Arxiv

0+阅读 · 2022年8月19日

Vector Quantized Diffusion Model with CodeUnet for Text-to-Sign Pose Sequences Generation

Arxiv

0+阅读 · 2022年8月19日

Analyzing Robustness of End-to-End Neural Models for Automatic Speech Recognition

Arxiv

0+阅读 · 2022年8月17日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

VIP会员

文章信息

相关主题

相关VIP内容

【斯坦福大学AI】BERT, ELMo， & GPT-2:上下文化的单词表示是怎样的?

专知会员服务

35+阅读 · 2020年3月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

BERT进展2019四篇必读论文

BERT进展2019四篇必读论文

专知会员服务

69+阅读 · 2020年1月2日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

GIT: A Generative Image-to-text Transformer for Vision and Language

Arxiv

1+阅读 · 2022年8月22日

A Novel Multi-Task Learning Approach for Context-Sensitive Compound Type Identification in Sanskrit

A Novel Multi-Task Learning Approach for Context-Sensitive Compound Type Identification in Sanskrit

Arxiv

0+阅读 · 2022年8月22日

Linear regression and its inference on noisy network-linked data

Arxiv

0+阅读 · 2022年8月19日

Vector Quantized Diffusion Model with CodeUnet for Text-to-Sign Pose Sequences Generation

Arxiv

0+阅读 · 2022年8月19日

Analyzing Robustness of End-to-End Neural Models for Automatic Speech Recognition

Arxiv

0+阅读 · 2022年8月17日

Attention Bottlenecks for Multimodal Fusion

Arxiv

31+阅读 · 2021年6月30日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

相关基金

视觉信息的局部特征表示及应用研究

国家自然科学基金

2+阅读 · 2015年12月31日

溶剂辅助激发态链式长程多质子转移反应动力学的理论研究

国家自然科学基金

0+阅读 · 2013年12月31日

高效光纤型SERS探针的界面设计、可控制备及性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

Perp在类风湿性关节炎外周Th17细胞存活中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

汉藏双语个性化多语种语音合成中的语言建模的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于替换的实时Web服务事务处理

国家自然科学基金

0+阅读 · 2012年12月31日

框架的冗余度

国家自然科学基金

0+阅读 · 2012年12月31日

全树枝状磷光分子简单混合构筑发光层的白光有机电致发光器件

国家自然科学基金

0+阅读 · 2012年12月31日

生物活性导向的麦角碱的多样性合成

国家自然科学基金

0+阅读 · 2012年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员