UniSpeech-SAT:在有意识的演讲人培训前进行普遍发言代表学习 (UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training) - 专知论文

会员服务 ·

0

SSL · Performer · 学成 · Integration · Extensibility ·

2021 年 10 月 12 日

UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training

翻译：UniSpeech-SAT:在有意识的演讲人培训前进行普遍发言代表学习

Sanyuan Chen,Yu Wu,Chengyi Wang,Zhengyang Chen,Zhuo Chen,Shujie Liu,Jian Wu,Yao Qian,Furu Wei,Jinyu Li,Xiangzhan Yu

from arxiv, ICASSP 2022 Submission

Self-supervised learning (SSL) is a long-standing goal for speech processing, since it utilizes large-scale unlabeled data and avoids extensive human labeling. Recent years witness great successes in applying self-supervised learning in speech recognition, while limited exploration was attempted in applying SSL for modeling speaker characteristics. In this paper, we aim to improve the existing SSL framework for speaker representation learning. Two methods are introduced for enhancing the unsupervised speaker information extraction. First, we apply the multi-task learning to the current SSL framework, where we integrate the utterance-wise contrastive loss with the SSL objective function. Second, for better speaker discrimination, we propose an utterance mixing strategy for data augmentation, where additional overlapped utterances are created unsupervisely and incorporate during training. We integrate the proposed methods into the HuBERT framework. Experiment results on SUPERB benchmark show that the proposed system achieves state-of-the-art performance in universal representation learning, especially for speaker identification oriented tasks. An ablation study is performed verifying the efficacy of each proposed method. Finally, we scale up training dataset to 94 thousand hours public audio data and achieve further performance improvement in all SUPERB tasks.

翻译：自监督学习(SSL)是语言处理的长期目标,因为它使用大规模无标签数据,避免了广泛的人类标签。近年来,在应用自监督的语音识别学习方面取得了巨大成功。近年来,在应用自监督的语音识别学习方面取得了巨大成功,在应用自监督的语音识别学习方面,尝试了有限的探索,在应用SSL模拟演讲者特点方面,我们试图进行有限的探索。在本文件中,我们的目标是改进现有的SSL(SSL)演讲者代言学习框架。在加强不受监督的语音信息提取方面,我们采用了两种方法。首先,我们将多任务学习应用到目前的SSL框架,在这个框架中,我们将表达式的、明智的对比性损失与SSL的目标功能结合起来。第二,为了更好的语音区分,我们提出了数据扩增扩增的超音调混合战略,在培训期间,额外重叠的超音量生成并纳入。我们把拟议的方法纳入HuBERT框架。SUPERB基准实验结果显示,拟议的系统在普遍代表学习方面达到了最先进的业绩,特别是针对演讲者识别任务。正在进行一项对比研究,正在核查每一项拟议方法的效能。最后,我们将SUPER培训任务升级到94小时的数据任务升级到公共数据。

0

相关内容

SSL

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【DeepMind深度学习课程】无监督表示学习前沿进展，129页ppt，Unsupervised Representation Learning

【DeepMind深度学习课程】无监督表示学习前沿进展，129页ppt，Unsupervised Representation Learning

专知会员服务

79+阅读 · 2020年6月29日

从多个自我监督任务中学习问题无关的语音表示，Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

从多个自我监督任务中学习问题无关的语音表示，Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

专知会员服务

17+阅读 · 2020年5月6日

【微软亚研】预训练文本表示作为元学习，Pre-training Text Representations

【微软亚研】预训练文本表示作为元学习，Pre-training Text Representations

专知会员服务

40+阅读 · 2020年4月17日

【CVPR2020-台大】透视眼：学会透过障碍物看东西，Learning to See Through Obstructions

【CVPR2020-台大】透视眼：学会透过障碍物看东西，Learning to See Through Obstructions

专知会员服务

27+阅读 · 2020年4月3日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

Arxiv

8+阅读 · 2021年6月10日

Whitening for Self-Supervised Representation Learning

Arxiv

6+阅读 · 2021年5月14日

Contrastive Embedding for Generalized Zero-Shot Learning

Arxiv

6+阅读 · 2021年3月30日

Contrastive Learning with Adversarial Examples

Arxiv

5+阅读 · 2020年10月22日

Deep Robust Clustering by Contrastive Learning

Arxiv

7+阅读 · 2020年8月7日

Semi-Supervised Contrastive Learning with Generalized Contrastive Loss and Its Application to Speaker Recognition

Semi-Supervised Contrastive Learning with Generalized Contrastive Loss and Its Application to Speaker Recognition

Arxiv

7+阅读 · 2020年6月8日

Curriculum Pre-training for End-to-End Speech Translation

Arxiv

4+阅读 · 2020年4月21日

Causal Discovery with Reinforcement Learning

Arxiv

4+阅读 · 2020年3月19日

Learning latent representations for style control and transfer in end-to-end speech synthesis

Learning latent representations for style control and transfer in end-to-end speech synthesis

Arxiv

5+阅读 · 2019年2月14日

Unified Hypersphere Embedding for Speaker Recognition

Arxiv

5+阅读 · 2018年7月22日

VIP会员

文章信息

相关主题

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【DeepMind深度学习课程】无监督表示学习前沿进展，129页ppt，Unsupervised Representation Learning

【DeepMind深度学习课程】无监督表示学习前沿进展，129页ppt，Unsupervised Representation Learning

专知会员服务

79+阅读 · 2020年6月29日

从多个自我监督任务中学习问题无关的语音表示，Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

从多个自我监督任务中学习问题无关的语音表示，Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks

专知会员服务

17+阅读 · 2020年5月6日

【微软亚研】预训练文本表示作为元学习，Pre-training Text Representations

【微软亚研】预训练文本表示作为元学习，Pre-training Text Representations

专知会员服务

40+阅读 · 2020年4月17日

【CVPR2020-台大】透视眼：学会透过障碍物看东西，Learning to See Through Obstructions

【CVPR2020-台大】透视眼：学会透过障碍物看东西，Learning to See Through Obstructions

专知会员服务

27+阅读 · 2020年4月3日

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

【北邮-腾讯AI】自监督学习音视觉说话人认证，Self-supervised learning for audio-visual speaker diarization

专知会员服务

26+阅读 · 2020年2月16日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

现代战争的杀伤区：规模结构、控制手段、生存与战线转移

中文版 | 人工智能时代的任务式指挥

中文版 | 数据投毒：AI驱动战争中优势地位的隐蔽武器

以色列在加沙战争部署新型军事人工智能

相关资讯

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

Arxiv

8+阅读 · 2021年6月10日

Whitening for Self-Supervised Representation Learning

Arxiv

6+阅读 · 2021年5月14日

Contrastive Embedding for Generalized Zero-Shot Learning

Arxiv

6+阅读 · 2021年3月30日

Contrastive Learning with Adversarial Examples

Arxiv

5+阅读 · 2020年10月22日

Deep Robust Clustering by Contrastive Learning

Arxiv

7+阅读 · 2020年8月7日

Semi-Supervised Contrastive Learning with Generalized Contrastive Loss and Its Application to Speaker Recognition

Semi-Supervised Contrastive Learning with Generalized Contrastive Loss and Its Application to Speaker Recognition

Arxiv

7+阅读 · 2020年6月8日

Curriculum Pre-training for End-to-End Speech Translation

Arxiv

4+阅读 · 2020年4月21日

Causal Discovery with Reinforcement Learning

Arxiv

4+阅读 · 2020年3月19日

Learning latent representations for style control and transfer in end-to-end speech synthesis

Learning latent representations for style control and transfer in end-to-end speech synthesis

Arxiv

5+阅读 · 2019年2月14日

Unified Hypersphere Embedding for Speaker Recognition

Arxiv

5+阅读 · 2018年7月22日

微信扫码咨询专知VIP会员