探讨wav2vec 2.0 关于发言者的校验和语言识别 (Exploring wav2vec 2.0 on speaker verification and language identification) - 专知论文

会员服务 ·

0

INFORMS · Performer · CASES · state-of-the-art · MoDELS ·

2020 年 12 月 11 日

Exploring wav2vec 2.0 on speaker verification and language identification

翻译：探讨wav2vec 2.0 关于发言者的校验和语言识别

Zhiyun Fan,Meng Li,Shiyu Zhou,Bo Xu

from arxiv, Self-supervised, speaker verification, language identification, multi-task learning, wav2vec 2.0

Wav2vec 2.0 is a recently proposed self-supervised framework for speech representation learning. It follows a two-stage training process of pre-training and fine-tuning, and performs well in speech recognition tasks especially ultra-low resource cases. In this work, we attempt to extend self-supervised framework to speaker verification and language identification. First, we use some preliminary experiments to indicate that wav2vec 2.0 can capture the information about the speaker and language. Then we demonstrate the effectiveness of wav2vec 2.0 on the two tasks respectively. For speaker verification, we obtain a new state-of-the-art result, Equal Error Rate (EER) of 3.61% on the VoxCeleb1 dataset. For language identification, we obtain an EER of 12.02% on 1 second condition and an EER of 3.47% on full-length condition of the AP17-OLR dataset. Finally, we utilize one model to achieve the unified modeling by the multi-task learning for the two tasks.

翻译：Wav2vec 2. 0是最近提出的语言代言学习自我监督框架。它遵循了培训前和微调的两阶段培训过程,并出色地完成了语音识别任务,特别是超低资源案例。在这项工作中,我们试图将自我监督的框架扩大到语音校验和语言识别。首先, 我们使用一些初步实验来表明 wav2vec 2. 0 能够捕捉有关语言和语言的信息。然后我们分别展示了 wav2vec 2. 0 在两种任务上的有效性。对于演讲者核查,我们获得了一个新的最新结果: VoxCeleb1数据集3.61%的平等错误率。对于语言识别,我们获得了12.02%的EER, 在AP17- OLR数据集全长条件下获得了3.47%的EER。最后, 我们使用一种模型来通过多任务学习实现统一的模型。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

最新《Deepfakes：创造与检测》2020综述论文，36页pdf

最新《Deepfakes：创造与检测》2020综述论文，36页pdf

专知会员服务

65+阅读 · 2020年5月15日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

基于BERT的ASR纠错

基于BERT的ASR纠错

深度学习自然语言处理

8+阅读 · 2020年7月16日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

PTGAN for Person Re-Identification

PTGAN for Person Re-Identification

统计学习与视觉计算组

4+阅读 · 2018年9月10日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

Leveraging Acoustic and Linguistic Embeddings from Pretrained speech and language Models for Intent Classification

Arxiv

0+阅读 · 2021年2月15日

Transformer-Transducers for Code-Switched Speech Recognition

Arxiv

0+阅读 · 2021年2月15日

Transformer Language Models with LSTM-based Cross-utterance Information Representation

Arxiv

0+阅读 · 2021年2月12日

Contrastive Unsupervised Learning for Speech Emotion Recognition

Arxiv

1+阅读 · 2021年2月12日

A Multi-View Approach To Audio-Visual Speaker Verification

Arxiv

0+阅读 · 2021年2月11日

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

Arxiv

0+阅读 · 2021年2月11日

Playing a Part: Speaker Verification at the Movies

Arxiv

0+阅读 · 2021年2月11日

Language Modeling with Deep Transformers

Arxiv

6+阅读 · 2019年7月11日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Unified Hypersphere Embedding for Speaker Recognition

Arxiv

5+阅读 · 2018年7月22日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

最新《Deepfakes：创造与检测》2020综述论文，36页pdf

最新《Deepfakes：创造与检测》2020综述论文，36页pdf

专知会员服务

65+阅读 · 2020年5月15日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

新质生成式AI赋能产业变革的实践与路径

用于多模态大模型的离散标记化：全面综述

Nature综述：金融网络中的物理学

【CMU博士论文】通信高效且差分隐私的优化方法

相关资讯

基于BERT的ASR纠错

基于BERT的ASR纠错

深度学习自然语言处理

8+阅读 · 2020年7月16日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

PTGAN for Person Re-Identification

PTGAN for Person Re-Identification

统计学习与视觉计算组

4+阅读 · 2018年9月10日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

最佳实践：深度学习用于自然语言处理（三）

最佳实践：深度学习用于自然语言处理（三）

待字闺中

3+阅读 · 2017年8月20日

相关论文

Leveraging Acoustic and Linguistic Embeddings from Pretrained speech and language Models for Intent Classification

Arxiv

0+阅读 · 2021年2月15日

Transformer-Transducers for Code-Switched Speech Recognition

Arxiv

0+阅读 · 2021年2月15日

Transformer Language Models with LSTM-based Cross-utterance Information Representation

Arxiv

0+阅读 · 2021年2月12日

Contrastive Unsupervised Learning for Speech Emotion Recognition

Arxiv

1+阅读 · 2021年2月12日

A Multi-View Approach To Audio-Visual Speaker Verification

Arxiv

0+阅读 · 2021年2月11日

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

Arxiv

0+阅读 · 2021年2月11日

Playing a Part: Speaker Verification at the Movies

Arxiv

0+阅读 · 2021年2月11日

Language Modeling with Deep Transformers

Arxiv

6+阅读 · 2019年7月11日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Unified Hypersphere Embedding for Speaker Recognition

Arxiv

5+阅读 · 2018年7月22日

微信扫码咨询专知VIP会员