使用 wav2vec 2. 0 的变换器的强力音量扬声器识别 (Robust Speaker Recognition with Transformers Using wav2vec 2.0)

Recent advances in unsupervised speech representation learning discover new approaches and provide new state-of-the-art for diverse types of speech processing tasks. This paper presents an investigation of using wav2vec 2.0 deep speech representations for the speaker recognition task. The proposed fine-tuning procedure of wav2vec 2.0 with simple TDNN and statistic pooling back-end using additive angular margin loss allows to obtain deep speaker embedding extractor that is well-generalized across different domains. It is concluded that Contrastive Predictive Coding pretraining scheme efficiently utilizes the power of unlabeled data, and thus opens the door to powerful transformer-based speaker recognition systems. The experimental results obtained in this study demonstrate that fine-tuning can be done on relatively small sets and a clean version of data. Using data augmentation during fine-tuning provides additional performance gains in speaker verification. In this study speaker recognition systems were analyzed on a wide range of well-known verification protocols: VoxCeleb1 cleaned test set, NIST SRE 18 development set, NIST SRE 2016 and NIST SRE 2019 evaluation set, VOiCES evaluation set, NIST 2021 SRE, and CTS challenges sets.

翻译：在未经监督的语音代表学习方面最近的进展,发现了新的方法,并为各类语音处理任务提供了新的最新技术。本文件对使用 wav2vec 2. 0 深层语音演示来进行语音识别任务进行了调查。拟议的 wav2vec 2. 0 微调程序与简单的TDNNN 微调程序,以及利用添加式角差损失来将后端集中起来的统计,可以获取深层语音嵌入式提取器,该提取器在不同领域广为普及。得出的结论是,竞争预测编码预培训计划有效地利用了无标签数据的力量,从而为强大的变压器语音识别系统打开了大门。本研究获得的实验结果表明,可以对相对小的数据集和干净的数据版本进行微调。在微调过程中使用数据增强提供了在语音核查方面的额外绩效收益。在这项研究中,对广泛知名的核查协议(VoxCeeleb1清洁测试集、NIST SRE 18开发集、NIST SRE 2016 和 NIST SRE 2019 评估集、VIES 2021 SRE 和 CTS 挑战集) 进行了分析。

相关内容

声纹识别

关注 444

说话人识别（Speaker Recognition），或者称为声纹识别（Voiceprint Recognition, VPR），是根据语音中所包含的说话人个性信息，利用计算机以及现在的信息识别技术，自动鉴别说话人身份的一种生物特征识别技术。说话人识别研究的目的就是从语音中提取具有说话人表征性的特征，建立有效的模型和系统，实现自动精准的说话人鉴别。

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日