Manuestro-U:利用联合演讲-文本代表学习来进行零监控演讲ASR (Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR) - 专知论文

会员服务 ·

0

语音识别 · 可约的 · 监督 · 表示 · 转录 ·

2022 年 10 月 21 日

Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR

翻译：Manuestro-U:利用联合演讲-文本代表学习来进行零监控演讲ASR

Zhehuai Chen,Ankur Bapna,Andrew Rosenberg,Yu Zhang,Bhuvana Ramabhadran,Pedro Moreno,Nanxin Chen

from arxiv, Accepted by SLT 2022

Training state-of-the-art Automated Speech Recognition (ASR) models typically requires a substantial amount of transcribed speech. In this work, we demonstrate that a modality-matched joint speech and text model can be leveraged to train a massively multilingual ASR model without any supervised (manually transcribed) speech for some languages. This paper explores the use of jointly learnt speech and text representations in a massively multilingual, zero supervised speech, real-world setting to expand the set of languages covered by ASR with only unlabeled speech and text in the target languages. Using the FLEURS dataset, we define the task to cover $102$ languages, where transcribed speech is available in $52$ of these languages and can be used to improve end-to-end ASR quality on the remaining $50$. First, we show that by combining speech representations with byte-level text representations and use of language embeddings, we can dramatically reduce the Character Error Rate (CER) on languages with no supervised speech from 64.8\% to 30.8\%, a relative reduction of 53\%. Second, using a subset of South Asian languages we show that Maestro-U can promote knowledge transfer from languages with supervised speech even when there is limited to no graphemic overlap. Overall, Maestro-U closes the gap to oracle performance by 68.5\% relative and reduces the CER of 19 languages below 15\%.

翻译：培训最先进的自动语音识别(ASR)模式通常要求大量转录语音。在这项工作中,我们证明可以利用一种模式式配对的联合语音和文本模式来培训大规模多语种的ASR模式,无需对一些语言进行任何监督(手动转录)的演讲。本文探讨了在大规模多语种、零监督的演讲、真实世界环境中使用联合学习的演讲和文字表述方式,以扩大ASR所涵盖的一套语言,仅使用目标语言的未加标记的演讲和文本。我们利用FLEURS数据集,界定了涵盖102美元语言的任务,在这些语言中,可提供520美元的转录制语音和文本模式,用于培训大规模多语言的大规模多语种(手语(手语翻译)语言。首先,我们表明,通过将语音表述与字级文字表达方式和语言嵌嵌入结合起来,我们可以大幅降低语言的字符错误率(CER)从64.8 ⁇ 降至30.8 ⁇ 。我们界定了53 ⁇ 的相对差距,第二,使用15美元转录式语言的分组,我们通过监督的磁标式将18度转换为近平调。

0

相关内容

语音识别

语音识别是计算机科学和计算语言学的一个跨学科子领域，它发展了一些方法和技术，使计算机可以将口语识别和翻译成文本。它也被称为自动语音识别（ASR），计算机语音识别或语音转文本（STT）。它整合了计算机科学，语言学和计算机工程领域的知识和研究。

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

多源水下爆炸气泡近水面耦合作用机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于动力学特性的多级行星齿轮传动系统故障机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

多性状全基因组关联分析新方法的探索

国家自然科学基金

0+阅读 · 2013年12月31日

基于SERF原子自旋惯性与磁场测量的水下导航方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

SPECT-CT引导体外控释多功能金纳米胶囊治疗晚期前列腺癌的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于石墨烯场效应晶体管生物传感器的心衰早期检测研究

国家自然科学基金

0+阅读 · 2012年12月31日

机电装备主控系统故障智能自愈策略研究

国家自然科学基金

0+阅读 · 2012年12月31日

Fuzzy Domain 理论及其新拓扑工具研究

国家自然科学基金

0+阅读 · 2010年12月31日

一种适用于高维问题的Co-kriging代理模型新方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

Can Self-Supervised Learning solve the problem of child speech recognition?

Arxiv

0+阅读 · 2022年12月2日

Cross-Modal Mutual Learning for Cued Speech Recognition

Arxiv

0+阅读 · 2022年12月2日

A Benchmark and Asymmetrical-Similarity Learning for Practical Image Copy Detection

Arxiv

0+阅读 · 2022年12月1日

Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs

Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs

Arxiv

0+阅读 · 2022年12月1日

CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning

Arxiv

0+阅读 · 2022年12月1日

Learning Progressive Modality-shared Transformers for Effective Visible-Infrared Person Re-identification

Arxiv

0+阅读 · 2022年12月1日

Task-Specific Embeddings for Ante-Hoc Explainable Text Classification

Arxiv

0+阅读 · 2022年11月30日

Topological Data Analysis for Speech Processing

Topological Data Analysis for Speech Processing

Arxiv

0+阅读 · 2022年11月30日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

UNITER: Learning UNiversal Image-TExt Representations

UNITER: Learning UNiversal Image-TExt Representations

Arxiv

23+阅读 · 2019年9月25日

VIP会员

文章信息

相关主题

相关VIP内容

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Can Self-Supervised Learning solve the problem of child speech recognition?

Arxiv

0+阅读 · 2022年12月2日

Cross-Modal Mutual Learning for Cued Speech Recognition

Arxiv

0+阅读 · 2022年12月2日

A Benchmark and Asymmetrical-Similarity Learning for Practical Image Copy Detection

Arxiv

0+阅读 · 2022年12月1日

Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs

Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs

Arxiv

0+阅读 · 2022年12月1日

CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning

Arxiv

0+阅读 · 2022年12月1日

Learning Progressive Modality-shared Transformers for Effective Visible-Infrared Person Re-identification

Arxiv

0+阅读 · 2022年12月1日

Task-Specific Embeddings for Ante-Hoc Explainable Text Classification

Arxiv

0+阅读 · 2022年11月30日

Topological Data Analysis for Speech Processing

Topological Data Analysis for Speech Processing

Arxiv

0+阅读 · 2022年11月30日

Pre-training Text Representations as Meta Learning

Arxiv

13+阅读 · 2020年4月12日

UNITER: Learning UNiversal Image-TExt Representations

UNITER: Learning UNiversal Image-TExt Representations

Arxiv

23+阅读 · 2019年9月25日

相关基金

多源水下爆炸气泡近水面耦合作用机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于动力学特性的多级行星齿轮传动系统故障机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

多性状全基因组关联分析新方法的探索

国家自然科学基金

0+阅读 · 2013年12月31日

基于SERF原子自旋惯性与磁场测量的水下导航方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

SPECT-CT引导体外控释多功能金纳米胶囊治疗晚期前列腺癌的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于石墨烯场效应晶体管生物传感器的心衰早期检测研究

国家自然科学基金

0+阅读 · 2012年12月31日

机电装备主控系统故障智能自愈策略研究

国家自然科学基金

0+阅读 · 2012年12月31日

Fuzzy Domain 理论及其新拓扑工具研究

国家自然科学基金

0+阅读 · 2010年12月31日

一种适用于高维问题的Co-kriging代理模型新方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员