WavLM: 全堆语音处理大型自我监督预培训 (WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing) - 专知论文

会员服务 ·

0

Processing（编程语言） · 去噪 · 全 · MoDELS · 学成 ·

2022 年 1 月 24 日

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

翻译：WavLM: 全堆语音处理大型自我监督预培训

Sanyuan Chen,Chengyi Wang,Zhengyang Chen,Yu Wu,Shujie Liu,Zhuo Chen,Jinyu Li,Naoyuki Kanda,Takuya Yoshioka,Xiong Xiao,Jian Wu,Long Zhou,Shuo Ren,Yanmin Qian,Yao Qian,Jian Wu,Michael Zeng,Xiangzhan Yu,Furu Wei

Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks. As speech signal contains multi-faceted information including speaker identity, paralinguistics, spoken content, etc., learning universal representations for all speech tasks is challenging. To tackle the problem, we propose a new pre-trained model, WavLM, to solve full-stack downstream speech tasks. WavLM jointly learns masked speech prediction and denoising in pre-training. By this means, WavLM does not only keep the speech content modeling capability by the masked speech prediction, but also improves the potential to non-ASR tasks by the speech denoising. In addition, WavLM employs gated relative position bias for the Transformer structure to better capture sequence ordering of input speech, and scale up the training dataset from 60k hours to 94k hours. WavLM Large achieves state-of-the-art performance on the SUPERB benchmark, and brings significant improvements for various speech processing tasks on their representative benchmarks. The code and pre-trained models are available at https://aka.ms/wavlm.

翻译：自我监督的学习(SSL)在语音识别方面取得巨大成功,而其他语音处理任务则尝试了有限的探索。由于语音信号包含多方面的信息,包括演讲者身份、语言学、口语内容等,学习所有演讲任务的普遍代表性具有挑战性。为了解决这个问题,我们提议了一个新的预培训模式,即WavLM, 以解决全斯塔克下游演讲任务。WavLM 联合学习了掩盖的语音预测和在培训前的解密。通过这个方法,WavLM不仅保持了蒙面语音预测的语音内容建模能力,而且还提高了通过语言解译的非ASR任务的潜力。此外,WavLM为变换结构设置了封闭的相对位置偏差,以更好地捕捉到对投入演讲的顺序,并将培训数据集从60k小时提高到94k小时。WavLM Grand在SUPERB基准上实现了最先进的表现,并大大改进了代表基准上的各种语音处理任务。

0

相关内容

Processing（编程语言）

Processing（编程语言）

Processing 是一门开源编程语言和与之配套的集成开发环境（IDE）的名称。Processing 在电子艺术和视觉设计社区被用来教授编程基础，并运用于大量的新媒体和互动艺术作品中。

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

专知会员服务

84+阅读 · 2020年11月25日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【ACL2020-Facebook AI】跨语言表示学习，Unsupervised Cross-lingual Representation Learning at Scale

【ACL2020-Facebook AI】跨语言表示学习，Unsupervised Cross-lingual Representation Learning at Scale

专知会员服务

27+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【ICDAR2019教程】用于文档分析、文本识别和语言建模的深度学习（Deep Learning for Document Analysis, Text Recognition, and Language Modeling）

【ICDAR2019教程】用于文档分析、文本识别和语言建模的深度学习（Deep Learning for Document Analysis, Text Recognition, and Language Modeling）

专知会员服务

22+阅读 · 2019年12月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

大数据高效能存储与管理方法研究

国家自然科学基金

3+阅读 · 2014年12月31日

基于人脸的性别分类和年龄估计统一学习框架及其拓展研究

国家自然科学基金

0+阅读 · 2014年12月31日

压电动能收集电路极限俘能性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于知识迁移的多时相高分辨率遥感影像分类方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于Exemplar-Classifier思想的高分辨率光学遥感影像目标识别研究

国家自然科学基金

2+阅读 · 2013年12月31日

高分测绘卫星动态成像质量与非平稳像移的多参数耦合机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

痕量气体卫星反演中大气Ring效应的同步探测机理与估算模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于语意多尺度马尔可夫模型的高分辨率遥感影像分割

国家自然科学基金

1+阅读 · 2009年12月31日

基于人类视觉感知的高分辨率卫星遥感图像智能分类方法研究

国家自然科学基金

1+阅读 · 2009年12月31日

离子束改性粉煤灰微观结构及吸附性能的机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

WuDaoMM: A large-scale Multi-Modal Dataset for Pre-training models

Arxiv

0+阅读 · 2022年4月19日

Self-Supervised Arbitrary-Scale Point Clouds Upsampling via Implicit Neural Representation

Arxiv

0+阅读 · 2022年4月18日

METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals

Arxiv

0+阅读 · 2022年4月16日

STRATA: Word Boundaries & Phoneme Recognition From Continuous Urdu Speech using Transfer Learning, Attention, & Data Augmentation

Arxiv

0+阅读 · 2022年4月16日

Evaluation Benchmarks for Spanish Sentence Representations

Arxiv

0+阅读 · 2022年4月15日

Graph Self-Supervised Learning: A Survey

Arxiv

15+阅读 · 2021年8月5日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

专知会员服务

84+阅读 · 2020年11月25日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【ACL2020-Facebook AI】跨语言表示学习，Unsupervised Cross-lingual Representation Learning at Scale

【ACL2020-Facebook AI】跨语言表示学习，Unsupervised Cross-lingual Representation Learning at Scale

专知会员服务

27+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【ICDAR2019教程】用于文档分析、文本识别和语言建模的深度学习（Deep Learning for Document Analysis, Text Recognition, and Language Modeling）

【ICDAR2019教程】用于文档分析、文本识别和语言建模的深度学习（Deep Learning for Document Analysis, Text Recognition, and Language Modeling）

专知会员服务

22+阅读 · 2019年12月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

WuDaoMM: A large-scale Multi-Modal Dataset for Pre-training models

Arxiv

0+阅读 · 2022年4月19日

Self-Supervised Arbitrary-Scale Point Clouds Upsampling via Implicit Neural Representation

Arxiv

0+阅读 · 2022年4月18日

METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals

Arxiv

0+阅读 · 2022年4月16日

STRATA: Word Boundaries & Phoneme Recognition From Continuous Urdu Speech using Transfer Learning, Attention, & Data Augmentation

Arxiv

0+阅读 · 2022年4月16日

Evaluation Benchmarks for Spanish Sentence Representations

Arxiv

0+阅读 · 2022年4月15日

Graph Self-Supervised Learning: A Survey

Arxiv

15+阅读 · 2021年8月5日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

相关基金

大数据高效能存储与管理方法研究

国家自然科学基金

3+阅读 · 2014年12月31日

基于人脸的性别分类和年龄估计统一学习框架及其拓展研究

国家自然科学基金

0+阅读 · 2014年12月31日

压电动能收集电路极限俘能性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于知识迁移的多时相高分辨率遥感影像分类方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于Exemplar-Classifier思想的高分辨率光学遥感影像目标识别研究

国家自然科学基金

2+阅读 · 2013年12月31日

高分测绘卫星动态成像质量与非平稳像移的多参数耦合机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

痕量气体卫星反演中大气Ring效应的同步探测机理与估算模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于语意多尺度马尔可夫模型的高分辨率遥感影像分割

国家自然科学基金

1+阅读 · 2009年12月31日

基于人类视觉感知的高分辨率卫星遥感图像智能分类方法研究

国家自然科学基金

1+阅读 · 2009年12月31日

离子束改性粉煤灰微观结构及吸附性能的机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员