Wav2vec 2.0/HuBERT语音情感识别、演讲人核查和口语理解的精调Wav2vec 2.0/HuBERT基准 (A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding) - 专知论文

会员服务 ·

0

语音识别 · Better · 可理解性 · MoDELS · 模型评估 ·

2021 年 11 月 4 日

A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding

翻译：Wav2vec 2.0/HuBERT语音情感识别、演讲人核查和口语理解的精调Wav2vec 2.0/HuBERT基准

Yingzhi Wang,Abdelmoumene Boumadane,Abdelwahab Heba

from arxiv, 5 pages, 2 figures

Self-supervised speech representations such as wav2vec 2.0 and HuBERT are making revolutionary progress in Automatic Speech Recognition (ASR). However, self-supervised models have not been totally proved to produce better performance on tasks other than ASR. In this work, we explore partial fine-tuning and entire fine-tuning on wav2vec 2.0 and HuBERT pre-trained models for three non-ASR speech tasks : Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding. We also compare pre-trained models with/without ASR fine-tuning. With simple down-stream frameworks, the best scores reach 79.58% weighted accuracy for Speech Emotion Recognition on IEMOCAP, 2.36% equal error rate for Speaker Verification on VoxCeleb1, 87.51% accuracy for Intent Classification and 75.32% F1 for Slot Filling on SLURP, thus setting a new state-of-the-art for these three benchmarks, proving that fine-tuned wav2vec 2.0 and HuBERT models can better learn prosodic, voice-print and semantic representations.

翻译：在这项工作中,我们探索了对Wav2vec2.0和HuBERT三个非ASR演讲任务进行部分微调和整个微调的方法,这三个非ASR演讲任务为:语音情感识别、发言人核查和口头语言理解。我们还比较了预先培训的模型和/没有ASR微调的模型。通过简单的下流框架,自我监督的模型的最佳得分达到IEMOCAP语音识别加权精度的79.58%, VoxCeleb1、87.51% Inted分类发言人核查的精度相等的2.36%,SLURP Slot填充的精度为75.32% F1,从而为这三项基准设定了新的最新技术,证明微调的 wav2vec 2.0和HuBERT 模型可以更好地学习Prosodic、语音和语义表达。

0

相关内容

语音识别

语音识别是计算机科学和计算语言学的一个跨学科子领域，它发展了一些方法和技术，使计算机可以将口语识别和翻译成文本。它也被称为自动语音识别（ASR），计算机语音识别或语音转文本（STT）。它整合了计算机科学，语言学和计算机工程领域的知识和研究。

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

1750亿参数！GPT-3来了！31位作者，OpenAI发布小样本学习器语言模型

1750亿参数！GPT-3来了！31位作者，OpenAI发布小样本学习器语言模型

专知会员服务

73+阅读 · 2020年5月30日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

【元学习 | ICASSP2020提交论文】学习低资源语音识别，国立台湾大学 | 李宏毅

【元学习 | ICASSP2020提交论文】学习低资源语音识别，国立台湾大学 | 李宏毅

专知会员服务

57+阅读 · 2019年11月21日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

哈工大讯飞联合实验室发布基于全词覆盖的中文BERT预训练模型

哈工大讯飞联合实验室发布基于全词覆盖的中文BERT预训练模型

哈工大SCIR

6+阅读 · 2019年6月20日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

BERT霸榜问答任务，谷歌新基准模型缩小AI与人类差距50%

BERT霸榜问答任务，谷歌新基准模型缩小AI与人类差距50%

未来产业促进会

4+阅读 · 2019年1月31日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

【论文推荐】最新五篇图像分割相关论文—R2U-Net、ScatterNet混合深度学习、分离卷积编解码、控制、Embedding

【论文推荐】最新五篇图像分割相关论文—R2U-Net、ScatterNet混合深度学习、分离卷积编解码、控制、Embedding

专知

7+阅读 · 2018年2月26日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

详述DeepMind wavenet原理及其TensorFlow实现

详述DeepMind wavenet原理及其TensorFlow实现

深度学习每日摘要

12+阅读 · 2017年6月26日

Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks

Arxiv

0+阅读 · 2022年1月7日

Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding

Arxiv

0+阅读 · 2022年1月6日

Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model

Arxiv

0+阅读 · 2022年1月6日

Curriculum Pre-training for End-to-End Speech Translation

Arxiv

4+阅读 · 2020年4月21日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

5+阅读 · 2019年9月26日

Semantics-aware BERT for Language Understanding

Arxiv

4+阅读 · 2019年9月5日

BERT for Joint Intent Classification and Slot Filling

Arxiv

12+阅读 · 2019年2月28日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

15+阅读 · 2018年10月11日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

VIP会员

文章信息

相关主题

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

1750亿参数！GPT-3来了！31位作者，OpenAI发布小样本学习器语言模型

1750亿参数！GPT-3来了！31位作者，OpenAI发布小样本学习器语言模型

专知会员服务

73+阅读 · 2020年5月30日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

【元学习 | ICASSP2020提交论文】学习低资源语音识别，国立台湾大学 | 李宏毅

【元学习 | ICASSP2020提交论文】学习低资源语音识别，国立台湾大学 | 李宏毅

专知会员服务

57+阅读 · 2019年11月21日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

哈工大讯飞联合实验室发布基于全词覆盖的中文BERT预训练模型

哈工大讯飞联合实验室发布基于全词覆盖的中文BERT预训练模型

哈工大SCIR

6+阅读 · 2019年6月20日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

BERT霸榜问答任务，谷歌新基准模型缩小AI与人类差距50%

BERT霸榜问答任务，谷歌新基准模型缩小AI与人类差距50%

未来产业促进会

4+阅读 · 2019年1月31日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

【论文推荐】最新五篇图像分割相关论文—R2U-Net、ScatterNet混合深度学习、分离卷积编解码、控制、Embedding

【论文推荐】最新五篇图像分割相关论文—R2U-Net、ScatterNet混合深度学习、分离卷积编解码、控制、Embedding

专知

7+阅读 · 2018年2月26日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

详述DeepMind wavenet原理及其TensorFlow实现

详述DeepMind wavenet原理及其TensorFlow实现

深度学习每日摘要

12+阅读 · 2017年6月26日

相关论文

Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks

Arxiv

0+阅读 · 2022年1月7日

Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding

Arxiv

0+阅读 · 2022年1月6日

Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model

Arxiv

0+阅读 · 2022年1月6日

Curriculum Pre-training for End-to-End Speech Translation

Arxiv

4+阅读 · 2020年4月21日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Arxiv

5+阅读 · 2019年9月26日

Semantics-aware BERT for Language Understanding

Arxiv

4+阅读 · 2019年9月5日

BERT for Joint Intent Classification and Slot Filling

Arxiv

12+阅读 · 2019年2月28日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

15+阅读 · 2018年10月11日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

微信扫码咨询专知VIP会员