疾病言语识别培训前框架 (Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition) - 专知论文

会员服务 ·

0

SSL · 语音识别 · Performer · 错误率 · MoDELS ·

2022 年 6 月 29 日

Investigating Self-supervised Pretraining Frameworks for Pathological Speech Recognition

翻译：疾病言语识别培训前框架

Lester Phillip Violeta,Wen-Chin Huang,Tomoki Toda

from arxiv, Accepted to INTERSPEECH 2022

We investigate the performance of self-supervised pretraining frameworks on pathological speech datasets used for automatic speech recognition (ASR). Modern end-to-end models require thousands of hours of data to train well, but only a small number of pathological speech datasets are publicly available. A proven solution to this problem is by first pretraining the model on a huge number of healthy speech datasets and then fine-tuning it on the pathological speech datasets. One new pretraining framework called self-supervised learning (SSL) trains a network using only speech data, providing more flexibility in training data requirements and allowing more speech data to be used in pretraining. We investigate SSL frameworks such as the wav2vec 2.0 and WavLM models using different setups and compare their performance with different supervised pretraining setups, using two types of pathological speech, namely, Japanese electrolaryngeal and English dysarthric. Our results show that although SSL has shown success with minimally resourced healthy speech, we do not find this to be the case with pathological speech. The best supervised setup outperforms the best SSL setup by 13.9% character error rate in electrolaryngeal speech and 16.8% word error rate in dysarthric speech.

翻译：我们调查了用于自动语音识别的病理语言语音数据集自我监督的预培训框架的性能。现代端到端模型需要数千小时的数据才能进行良好的培训,但只有少量病理语言数据集可以公开提供。这个问题的一个证明解决办法是首先在大量健康语音数据集上对模型进行预先培训,然后在病理语言数据集上进行微调。一个称为自我监督学习(SSL)的新预培训框架仅用语音数据培训网络,在培训数据要求方面提供更大的灵活性,并允许在预培训中使用更多的语音数据。我们使用不同的设置来调查诸如Wav2vec 2.0和WavLM模型等SSL框架,并将其性能与不同监管的预培训设置进行比较,使用两种病理演讲类型,即日本电极和英语语调。我们的结果显示,虽然SSL在最少资源的健康演讲中表现出成功,但我们并不认为这属于病理学演讲的例子。我们使用最受监督的语音标注的语音比率为16.8%,而最佳的SLSLA值错误率是SLA16。

0

相关内容

SSL

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

S3AGA样本（Spitzer-SDSS Spectral Atlas of Galaxies and AGNs)及其AGN研究

国家自然科学基金

0+阅读 · 2014年12月31日

可金属修饰的Keggin型缺位多酸基MOFs的设计合成及催化性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

新型金属富勒烯的结构与性质研究

国家自然科学基金

0+阅读 · 2012年12月31日

创伤后应激障碍的临床症状诊断模型

国家自然科学基金

0+阅读 · 2012年12月31日

掺杂5d过渡金属、C、N对ReB2和WB4的热、力学性能影响的研究

国家自然科学基金

0+阅读 · 2012年12月31日

昆虫取食诱导的蒙古沙冬青细胞跨膜离子流对代谢组的影响

国家自然科学基金

0+阅读 · 2012年12月31日

核反应堆模型钢FeCrNi微观缺陷结构及演变机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于三嵌段聚合物/多酸静电复合自组装的水凝胶制备及功能化

国家自然科学基金

0+阅读 · 2012年12月31日

液相法制备钒酸铋光催化剂及其光催化活性增强机理的研究

国家自然科学基金

0+阅读 · 2011年12月31日

福氏志贺氏菌HtrA蛋白功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition

Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition

Arxiv

0+阅读 · 2022年8月18日

Towards Label-efficient Automatic Diagnosis and Analysis: A Comprehensive Survey of Advanced Deep Learning-based Weakly-supervised, Semi-supervised and Self-supervised Techniques in Histopathological Image Analysis

Arxiv

0+阅读 · 2022年8月18日

Analyzing Robustness of End-to-End Neural Models for Automatic Speech Recognition

Arxiv

0+阅读 · 2022年8月17日

SensorSCAN: Self-Supervised Learning and Deep Clustering for Fault Diagnosis in Chemical Processes

Arxiv

1+阅读 · 2022年8月17日

Investigate the Essence of Long-Tailed Recognition from a Unified Perspective

Arxiv

0+阅读 · 2022年8月17日

Semi-supervised Transfer Learning for Evaluation of Model Classification Performance

Arxiv

0+阅读 · 2022年8月16日

Self-Supervised Learning of Graph Neural Networks: A Unified Review

Arxiv

38+阅读 · 2021年2月23日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

VIP会员

文章信息

相关主题

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《小型无人机系统侦测追踪技术：声学、计算机视觉与深度学习融合方案》最新98页

《"牧羊人网格"拦截策略：实现无人机集群可靠拦截的新范式》

光纤无人机：反无人机系统的重大挑战

《作战建模与仿真实证研究》

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition

Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition

Arxiv

0+阅读 · 2022年8月18日

Towards Label-efficient Automatic Diagnosis and Analysis: A Comprehensive Survey of Advanced Deep Learning-based Weakly-supervised, Semi-supervised and Self-supervised Techniques in Histopathological Image Analysis

Arxiv

0+阅读 · 2022年8月18日

Analyzing Robustness of End-to-End Neural Models for Automatic Speech Recognition

Arxiv

0+阅读 · 2022年8月17日

SensorSCAN: Self-Supervised Learning and Deep Clustering for Fault Diagnosis in Chemical Processes

Arxiv

1+阅读 · 2022年8月17日

Investigate the Essence of Long-Tailed Recognition from a Unified Perspective

Arxiv

0+阅读 · 2022年8月17日

Semi-supervised Transfer Learning for Evaluation of Model Classification Performance

Arxiv

0+阅读 · 2022年8月16日

Self-Supervised Learning of Graph Neural Networks: A Unified Review

Arxiv

38+阅读 · 2021年2月23日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

相关基金

S3AGA样本（Spitzer-SDSS Spectral Atlas of Galaxies and AGNs)及其AGN研究

国家自然科学基金

0+阅读 · 2014年12月31日

可金属修饰的Keggin型缺位多酸基MOFs的设计合成及催化性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

新型金属富勒烯的结构与性质研究

国家自然科学基金

0+阅读 · 2012年12月31日

创伤后应激障碍的临床症状诊断模型

国家自然科学基金

0+阅读 · 2012年12月31日

掺杂5d过渡金属、C、N对ReB2和WB4的热、力学性能影响的研究

国家自然科学基金

0+阅读 · 2012年12月31日

昆虫取食诱导的蒙古沙冬青细胞跨膜离子流对代谢组的影响

国家自然科学基金

0+阅读 · 2012年12月31日

核反应堆模型钢FeCrNi微观缺陷结构及演变机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于三嵌段聚合物/多酸静电复合自组装的水凝胶制备及功能化

国家自然科学基金

0+阅读 · 2012年12月31日

液相法制备钒酸铋光催化剂及其光催化活性增强机理的研究

国家自然科学基金

0+阅读 · 2011年12月31日

福氏志贺氏菌HtrA蛋白功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员