LAE: 单语和多种语言的ASR语言软件编码器 (LAE: Language-Aware Encoder for Monolingual and Multilingual ASR) - 专知论文

会员服务 ·

0

语音识别 · INFORMS · 块 · Learning · Performer ·

2022 年 6 月 5 日

LAE: Language-Aware Encoder for Monolingual and Multilingual ASR

翻译：LAE: 单语和多种语言的ASR语言软件编码器

Jinchuan Tian,Jianwei Yu,Chunlei Zhang,Chao Weng,Yuexian Zou,Dong Yu

Despite the rapid progress in automatic speech recognition (ASR) research, recognizing multilingual speech using a unified ASR system remains highly challenging. Previous works on multilingual speech recognition mainly focus on two directions: recognizing multiple monolingual speech or recognizing code-switched speech that uses different languages interchangeably within a single utterance. However, a pragmatic multilingual recognizer is expected to be compatible with both directions. In this work, a novel language-aware encoder (LAE) architecture is proposed to handle both situations by disentangling language-specific information and generating frame-level language-aware representations during encoding. In the LAE, the primary encoding is implemented by the shared block while the language-specific blocks are used to extract specific representations for each language. To learn language-specific information discriminatively, a language-aware training method is proposed to optimize the language-specific blocks in LAE. Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level and shows superior performance on both monolingual and multilingual ASR tasks. With either a real-recorded or simulated code-switched dataset, the proposed LAE achieves statistically significant improvements on both CTC and neural transducer systems. Code is released

翻译：尽管在自动语音识别(ASR)研究方面进展迅速,但使用统一的ASR系统承认多语种语言的研究仍然极具挑战性。以前关于多语种语音识别的工作主要侧重于两个方向:承认多种单一语言的语音,或承认在单一语句中可互换使用不同语言的密码转换语言的语音。然而,一个实用的多语种识别器预计将与两个方向相容。在这项工作中,提议建立一个新颖的有语言意识的编码器(LAE)结构,通过拆分特定语言的信息和在编码过程中生成框架一级的语言觉悟表征,来处理两种情况。在LAE中,主要编码由共享块执行,而语言特定块则用于为每种语言提取具体的表达方式。为了有区别地学习特定语言的信息,建议一种有语言意识的培训方法来优化LAEE的特定语言块。对曼达林英语编码转换码的实验表明,拟议的LAEE能够在框架一级歧视不同语言,并显示单一语言和多语种语言任务上的优异性表现。在实际记录或模拟的代码转换系统上都取得了显著的改进。

0

相关内容

语音识别

语音识别是计算机科学和计算语言学的一个跨学科子领域，它发展了一些方法和技术，使计算机可以将口语识别和翻译成文本。它也被称为自动语音识别（ASR），计算机语音识别或语音转文本（STT）。它整合了计算机科学，语言学和计算机工程领域的知识和研究。

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

肥胖相关Hepatokine LECT2在肝脏中的调控及机制

国家自然科学基金

1+阅读 · 2015年12月31日

γ-Synuclein调控MAPK-ERK-JNK信号通路及细胞周期促进子宫内膜癌恶性进展的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

靶向调节HDAC6增加t-PA静脉溶栓治疗的有效性及安全性研究

国家自然科学基金

0+阅读 · 2013年12月31日

Mg-Zn-RE(Ce,Nd)系镁合金强化相析出过程与强化机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于融合智能算法斜拉桥振动控制Benchmark问题的混合控制策略研究

国家自然科学基金

0+阅读 · 2013年12月31日

MG53 对急性心肌梗死植入干细胞的保护作用及其机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

航空发动机主轴轴承故障特征提取、智能诊断及预示方法研究

国家自然科学基金

2+阅读 · 2012年12月31日

耳叶水苋对ALS抑制剂的抗药性机理

国家自然科学基金

0+阅读 · 2011年12月31日

PAK5抑制血管平滑肌细胞凋亡作用机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

强磁场对铁磁合金调幅分解行为的影响及机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

Efficient Modeling of Future Context for Image Captioning

Arxiv

0+阅读 · 2022年7月22日

Leveraging Action Affinity and Continuity for Semi-supervised Temporal Action Segmentation

Arxiv

0+阅读 · 2022年7月21日

Is an Object-Centric Video Representation Beneficial for Transfer?

Is an Object-Centric Video Representation Beneficial for Transfer?

Arxiv

0+阅读 · 2022年7月20日

The Atlas Benchmark: an Automated Evaluation Framework for Human Motion Prediction

Arxiv

0+阅读 · 2022年7月20日

Speech Enhancement Guided by Contextual Articulatory Information

Arxiv

0+阅读 · 2022年7月19日

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding

Arxiv

0+阅读 · 2022年7月19日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

Dissecting Contextual Word Embeddings: Architecture and Representation

Dissecting Contextual Word Embeddings: Architecture and Representation

Arxiv

22+阅读 · 2018年8月27日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《全谱战争——从拓宽工具到思考不可思考之事》

《FPV武装无人机的战斗飞行艺术与科学》最新报告

无人机作战：演进、创新与未来战场

《反无人机：用于无人机探测与定位的多输入多输出雷达》最新69页

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Efficient Modeling of Future Context for Image Captioning

Arxiv

0+阅读 · 2022年7月22日

Leveraging Action Affinity and Continuity for Semi-supervised Temporal Action Segmentation

Arxiv

0+阅读 · 2022年7月21日

Is an Object-Centric Video Representation Beneficial for Transfer?

Is an Object-Centric Video Representation Beneficial for Transfer?

Arxiv

0+阅读 · 2022年7月20日

The Atlas Benchmark: an Automated Evaluation Framework for Human Motion Prediction

Arxiv

0+阅读 · 2022年7月20日

Speech Enhancement Guided by Contextual Articulatory Information

Arxiv

0+阅读 · 2022年7月19日

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding

Arxiv

0+阅读 · 2022年7月19日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

Learning in the Frequency Domain

Learning in the Frequency Domain

Arxiv

11+阅读 · 2020年3月12日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

Dissecting Contextual Word Embeddings: Architecture and Representation

Dissecting Contextual Word Embeddings: Architecture and Representation

Arxiv

22+阅读 · 2018年8月27日

相关基金

肥胖相关Hepatokine LECT2在肝脏中的调控及机制

国家自然科学基金

1+阅读 · 2015年12月31日

γ-Synuclein调控MAPK-ERK-JNK信号通路及细胞周期促进子宫内膜癌恶性进展的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

靶向调节HDAC6增加t-PA静脉溶栓治疗的有效性及安全性研究

国家自然科学基金

0+阅读 · 2013年12月31日

Mg-Zn-RE(Ce,Nd)系镁合金强化相析出过程与强化机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于融合智能算法斜拉桥振动控制Benchmark问题的混合控制策略研究

国家自然科学基金

0+阅读 · 2013年12月31日

MG53 对急性心肌梗死植入干细胞的保护作用及其机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

航空发动机主轴轴承故障特征提取、智能诊断及预示方法研究

国家自然科学基金

2+阅读 · 2012年12月31日

耳叶水苋对ALS抑制剂的抗药性机理

国家自然科学基金

0+阅读 · 2011年12月31日

PAK5抑制血管平滑肌细胞凋亡作用机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

强磁场对铁磁合金调幅分解行为的影响及机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员