HCCL NIST SRE21系统 (The HCCL System for the NIST SRE21)

This paper describes the systems developed by the HCCL team for the NIST 2021 speaker recognition evaluation (NIST SRE21).We first explore various state-of-the-art speaker embedding extractors combined with a novel circle loss to obtain discriminative deep speaker embeddings. Considering that cross-channel and cross-linguistic speaker recognition are the key challenges of SRE21, we introduce several techniques to reduce the cross-domain mismatch. Specifically, Codec and speech enhancement are directly applied to the raw speech to eliminate the codecs and the environment noise mismatch. We denote the methods that work directly on speech to eliminate the relatively explicit mismatches collectively as data adaptation methods. Experiments show that data adaption methods achieve 15\% improvements over our baseline. Furthermore, some popular back-ends domain adaptation algorithms are deployed on speaker embeddings to alleviate speaker performance degradation caused by the implicit mismatch. Score calibration is a major failure for us in SRE21. The reason is that score calibration with too many parameters easily lead to overfitting problems.

翻译：本文描述了HCCL团队为NIST 2021 语音识别评价开发的系统(NIST SRE21)。我们首先探索各种最先进的演讲者嵌入提取器,加上新的循环损失,以获得具有歧视性的深层演讲者嵌入器。考虑到跨频道和跨语言演讲者识别是SRE21的关键挑战, 我们引入了几种技术来减少跨界错配。具体地说, 代码和语音增强直接应用在原始演讲中,以消除编码器和环境噪音错配。我们指出,直接在演讲中工作以消除数据适应方法中相对明显的不匹配的方法。实验显示,数据调整方法在基线上实现了15 ⁇ 的改进。此外,一些受欢迎的后端域适应算法被安装在演讲者嵌入器上,以缓解语言因隐含的错配而导致的音性能退化。计分校准是我们SRE21 中的主要失败。原因是,与过多参数的校准分很容易导致问题过多。

相关内容

声纹识别

关注 444

说话人识别（Speaker Recognition），或者称为声纹识别（Voiceprint Recognition, VPR），是根据语音中所包含的说话人个性信息，利用计算机以及现在的信息识别技术，自动鉴别说话人身份的一种生物特征识别技术。说话人识别研究的目的就是从语音中提取具有说话人表征性的特征，建立有效的模型和系统，实现自动精准的说话人鉴别。

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日