SERLM: 强化语言预培训,提供无控制文本数据 (SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data) - 专知论文

会员服务 ·

0

离散化 · MoDELS · 词元分析器 · HTTPS · 语言模型化 ·

2022 年 9 月 30 日

SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

翻译：SERLM: 强化语言预培训,提供无控制文本数据

Ziqiang Zhang,Sanyuan Chen,Long Zhou,Yu Wu,Shuo Ren,Shujie Liu,Zhuoyuan Yao,Xun Gong,Lirong Dai,Jinyu Li,Furu Wei

from arxiv, 14 pages

How to boost speech pre-training with textual data is an unsolved problem due to the fact that speech and text are very different modalities with distinct characteristics. In this paper, we propose a cross-modal Speech and Language Model (SpeechLM) to explicitly align speech and text pre-training with a pre-defined unified discrete representation. Specifically, we introduce two alternative discrete tokenizers to bridge the speech and text modalities, including phoneme-unit and hidden-unit tokenizers, which can be trained using a small amount of paired speech-text data. Based on the trained tokenizers, we convert the unlabeled speech and text data into tokens of phoneme units or hidden units. The pre-training objective is designed to unify the speech and the text into the same discrete semantic space with a unified Transformer network. Leveraging only 10K text sentences, our SpeechLM gets a 16\% relative WER reduction over the best base model performance (from 6.8 to 5.7) on the public LibriSpeech ASR benchmark. Moreover, SpeechLM with fewer parameters even outperforms previous SOTA models on CoVoST-2 speech translation tasks. We also evaluate our SpeechLM on various spoken language processing tasks under the universal representation evaluation framework SUPERB, demonstrating significant improvements on content-related tasks. Our code and models are available at https://aka.ms/SpeechLM.

翻译：如何用文本数据来提升语言预培训前的文本数据是一个尚未解决的问题,因为语言和文本是截然不同的模式。在本文中,我们提议了一个跨模式语言和语言模型(SpeechLM),以明确将语言和文本预培训与预定义的统一离散代表制统一。具体地说,我们引入了两个替代的离散象征器,以连接语言和文本模式,包括电话-单位和隐藏单位象征性符号,这可以通过少量配对语言-文本数据进行培训。根据经过培训的代号器,我们将未标的语音和文本数据转换为电话单位或隐藏单位的代号。培训前的目标是将语言和文本与一个统一的变异器网络统一统一到同一个独立的语义区区间。我们仅使用10K的文本句,我们的SULSUL语言模型在公共 LibriSpeech ASR基准中,其最佳基础性能(从6.8到5.7)的相对WER值减少16。此外,SpeLM(SERM)的参数更小,其参数甚至超越了电话语言单位单位的语音-SODAMASL 格式的改进任务。

0

相关内容

离散化

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Akt-mTOR-Snail信号通路诱导肺动脉高压内皮细胞向间充质细胞转化的分子机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

变指数非线性分析中的若干问题

国家自然科学基金

0+阅读 · 2013年12月31日

miR-455-5p介导的ADMA代谢和转运网络调控在内皮损伤中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

离子液体溶剂法捕集分离二氧化碳基础研究

国家自然科学基金

0+阅读 · 2012年12月31日

生物分子模拟中的PDE模型与高效计算

国家自然科学基金

0+阅读 · 2012年12月31日

光老化皮肤CatG、MMPS对TGF-β/Smad通路的调控及交互作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Ghrelin抑制糖脂毒性诱导的心肌胰岛素抵抗的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

IRES调控EV71神经毒性的分子机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

具有离子通道效应的陶瓷纳滤膜构建基础研究

国家自然科学基金

0+阅读 · 2011年12月31日

Axin和Tankyrase2调节糖代谢的机理

国家自然科学基金

0+阅读 · 2009年12月31日

ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

Arxiv

0+阅读 · 2022年11月7日

Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning

Arxiv

0+阅读 · 2022年11月6日

Bridging Speech and Textual Pre-trained Models with Unsupervised ASR

Arxiv

0+阅读 · 2022年11月6日

Breaking the trade-off in personalized speech enhancement with cross-task knowledge distillation

Arxiv

0+阅读 · 2022年11月5日

Cold Diffusion for Speech Enhancement

Arxiv

0+阅读 · 2022年11月4日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources

Arxiv

13+阅读 · 2019年11月14日

K-BERT: Enabling Language Representation with Knowledge Graph

K-BERT: Enabling Language Representation with Knowledge Graph

Arxiv

19+阅读 · 2019年9月17日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

VIP会员

文章信息

相关主题

词元分析器

语言模型化

相关VIP内容

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

Arxiv

0+阅读 · 2022年11月7日

Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning

Arxiv

0+阅读 · 2022年11月6日

Bridging Speech and Textual Pre-trained Models with Unsupervised ASR

Arxiv

0+阅读 · 2022年11月6日

Breaking the trade-off in personalized speech enhancement with cross-task knowledge distillation

Arxiv

0+阅读 · 2022年11月5日

Cold Diffusion for Speech Enhancement

Arxiv

0+阅读 · 2022年11月4日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources

Arxiv

13+阅读 · 2019年11月14日

K-BERT: Enabling Language Representation with Knowledge Graph

K-BERT: Enabling Language Representation with Knowledge Graph

Arxiv

19+阅读 · 2019年9月17日

Pre-Training with Whole Word Masking for Chinese BERT

Arxiv

11+阅读 · 2019年6月19日

相关基金

Akt-mTOR-Snail信号通路诱导肺动脉高压内皮细胞向间充质细胞转化的分子机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

变指数非线性分析中的若干问题

国家自然科学基金

0+阅读 · 2013年12月31日

miR-455-5p介导的ADMA代谢和转运网络调控在内皮损伤中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

离子液体溶剂法捕集分离二氧化碳基础研究

国家自然科学基金

0+阅读 · 2012年12月31日

生物分子模拟中的PDE模型与高效计算

国家自然科学基金

0+阅读 · 2012年12月31日

光老化皮肤CatG、MMPS对TGF-β/Smad通路的调控及交互作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Ghrelin抑制糖脂毒性诱导的心肌胰岛素抵抗的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

IRES调控EV71神经毒性的分子机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

具有离子通道效应的陶瓷纳滤膜构建基础研究

国家自然科学基金

0+阅读 · 2011年12月31日

Axin和Tankyrase2调节糖代谢的机理

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员