分析影响自我监督、经过训练的语音承认代表的效用的因素 (Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition) - 专知论文

会员服务 ·

0

语音识别 · SSL · Performer · Learning · Extensibility ·

2022 年 8 月 18 日

Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition

翻译：分析影响自我监督、经过训练的语音承认代表的效用的因素

Lodagala V S V Durga Prasad,Ashish Seth,Sreyan Ghosh,S. Umesh

from arxiv, Submitted to IEEE SLT 2022

Self-supervised learning (SSL) to learn high-level speech representations has been a popular approach to building Automatic Speech Recognition (ASR) systems in low-resource settings. However, the common assumption made in literature is that a considerable amount of unlabeled data is available for the same domain or language that can be leveraged for SSL pre-training, which we acknowledge is not feasible in a real-world setting. In this paper, as part of the Interspeech Gram Vaani ASR challenge, we try to study the effect of domain, language, dataset size, and other aspects of our upstream pre-training SSL data on the final performance low-resource downstream ASR task. We also build on the continued pre-training paradigm to study the effect of prior knowledge possessed by models trained using SSL. Extensive experiments and studies reveal that the performance of ASR systems is susceptible to the data used for SSL pre-training. Their performance improves with an increase in similarity and volume of pre-training data. We believe our work will be helpful to the speech community in building better ASR systems in low-resource settings and steer research towards improving generalization in SSL-based pre-training for speech systems.

翻译：在低资源环境下建立自动语音识别系统(ASR)的流行做法是学习高层次演讲代表的自监督学习(SSL),但是,文献中的共同假设是,在同一个领域或可用于SSL培训前培训的语文中,有大量未贴标签的数据可用,我们承认,在现实环境中,这些数据不可行。在本文中,作为Interspeech Gram Vaani ASR挑战的一部分,我们试图研究我们上游SLS培训前数据对最后的低资源下游性能任务的影响、语言、数据集大小和其他方面。我们还利用持续的培训前模式,研究使用SSL培训前培训模型所具备的先前知识的影响。广泛的实验和研究显示,ASR系统的性能很容易受到SSL培训前培训所用数据的影响。这些系统的性能随着培训前数据的相似性和数量增加而得到改善。我们认为,我们的工作将有助于语言界在低资源环境中建立更好的ASL系统,并指导研究改进基于SSL的语音系统的一般化。

0

相关内容

语音识别

语音识别是计算机科学和计算语言学的一个跨学科子领域，它发展了一些方法和技术，使计算机可以将口语识别和翻译成文本。它也被称为自动语音识别（ASR），计算机语音识别或语音转文本（STT）。它整合了计算机科学，语言学和计算机工程领域的知识和研究。

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Cr离子变价对镍基合金氟盐腐蚀行为的影响及机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

农户生产行为对农业面源污染的影响及控制对策研究

国家自然科学基金

0+阅读 · 2013年12月31日

Na+-K+-ATPase特异性DR抗体对大鼠心肌缺血/再灌注损伤的保护及分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

玉米长链非编码RNA-ZmUHR的功能及调控机理分析

国家自然科学基金

0+阅读 · 2012年12月31日

miR-499在急性心肌梗死早期辅助诊断及治疗中的作用及其分子机制探讨

国家自然科学基金

0+阅读 · 2012年12月31日

自噬在吸入麻醉药诱导的老龄大鼠认知功能障碍中的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于多相似度融合的非编码RNA结构比对和分类识别

国家自然科学基金

1+阅读 · 2011年12月31日

基于内容的流行音乐结构分析的研究

国家自然科学基金

1+阅读 · 2009年12月31日

航空发动机疲劳寿命预测及故障诊断研究

国家自然科学基金

5+阅读 · 2008年12月31日

Large vocabulary speech recognition for languages of Africa: multilingual modeling and self-supervised learning

Arxiv

0+阅读 · 2022年10月4日

Syntax-driven Data Augmentation for Named Entity Recognition

Arxiv

0+阅读 · 2022年10月1日

Match to Win: Analysing Sequences Lengths for Efficient Self-supervised Learning in Speech and Audio

Arxiv

0+阅读 · 2022年9月30日

On The Robustness of Self-Supervised Representations for Spoken Language Modeling

Arxiv

0+阅读 · 2022年9月30日

What Makes Pre-trained Language Models Better Zero/Few-shot Learners?

Arxiv

0+阅读 · 2022年9月30日

Husformer: A Multi-Modal Transformer for Multi-Modal Human State Recognition

Arxiv

0+阅读 · 2022年9月30日

Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data

Arxiv

28+阅读 · 2022年6月8日

Efficient Visual Recognition with Deep Neural Networks: A Survey on Recent Advances and New Directions

Arxiv

20+阅读 · 2021年8月30日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

数据驱动死亡：以色列AI战争机器如何锁定目标

【普林斯顿博士论文】通过以人为本的评估推动负责任的人工智能

ICML 2025 | BiAssemble: 双臂机器人几何拼合问题的协同可供性学习

ICML 2025杰出论文出炉：8篇获奖，南大研究者榜上有名

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Large vocabulary speech recognition for languages of Africa: multilingual modeling and self-supervised learning

Arxiv

0+阅读 · 2022年10月4日

Syntax-driven Data Augmentation for Named Entity Recognition

Arxiv

0+阅读 · 2022年10月1日

Match to Win: Analysing Sequences Lengths for Efficient Self-supervised Learning in Speech and Audio

Arxiv

0+阅读 · 2022年9月30日

On The Robustness of Self-Supervised Representations for Spoken Language Modeling

Arxiv

0+阅读 · 2022年9月30日

What Makes Pre-trained Language Models Better Zero/Few-shot Learners?

Arxiv

0+阅读 · 2022年9月30日

Husformer: A Multi-Modal Transformer for Multi-Modal Human State Recognition

Arxiv

0+阅读 · 2022年9月30日

Beyond Just Vision: A Review on Self-Supervised Representation Learning on Multimodal and Temporal Data

Arxiv

28+阅读 · 2022年6月8日

Efficient Visual Recognition with Deep Neural Networks: A Survey on Recent Advances and New Directions

Arxiv

20+阅读 · 2021年8月30日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

相关基金

Cr离子变价对镍基合金氟盐腐蚀行为的影响及机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

农户生产行为对农业面源污染的影响及控制对策研究

国家自然科学基金

0+阅读 · 2013年12月31日

Na+-K+-ATPase特异性DR抗体对大鼠心肌缺血/再灌注损伤的保护及分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

玉米长链非编码RNA-ZmUHR的功能及调控机理分析

国家自然科学基金

0+阅读 · 2012年12月31日

miR-499在急性心肌梗死早期辅助诊断及治疗中的作用及其分子机制探讨

国家自然科学基金

0+阅读 · 2012年12月31日

自噬在吸入麻醉药诱导的老龄大鼠认知功能障碍中的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于多相似度融合的非编码RNA结构比对和分类识别

国家自然科学基金

1+阅读 · 2011年12月31日

基于内容的流行音乐结构分析的研究

国家自然科学基金

1+阅读 · 2009年12月31日

航空发动机疲劳寿命预测及故障诊断研究

国家自然科学基金

5+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员