This paper describes the systems developed by the HCCL team for the NIST 2021 speaker recognition evaluation (NIST SRE21).We first explore various state-of-the-art speaker embedding extractors combined with a novel circle loss to obtain discriminative deep speaker embeddings. Considering that cross-channel and cross-linguistic speaker recognition are the key challenges of SRE21, we introduce several techniques to reduce the cross-domain mismatch. Specifically, Codec and speech enhancement are directly applied to the raw speech to eliminate the codecs and the environment noise mismatch. We denote the methods that work directly on speech to eliminate the relatively explicit mismatches collectively as data adaptation methods. Experiments show that data adaption methods achieve 15\% improvements over our baseline. Furthermore, some popular back-ends domain adaptation algorithms are deployed on speaker embeddings to alleviate speaker performance degradation caused by the implicit mismatch. Score calibration is a major failure for us in SRE21. The reason is that score calibration with too many parameters easily lead to overfitting problems.
翻译:本文描述了HCCL团队为NIST 2021 语音识别评价开发的系统(NIST SRE21)。 我们首先探索各种最先进的演讲者嵌入提取器,加上新的循环损失,以获得具有歧视性的深层演讲者嵌入器。 考虑到跨频道和跨语言演讲者识别是SRE21的关键挑战, 我们引入了几种技术来减少跨界错配。 具体地说, 代码和语音增强直接应用在原始演讲中,以消除编码器和环境噪音错配。 我们指出,直接在演讲中工作以消除数据适应方法中相对明显的不匹配的方法。 实验显示,数据调整方法在基线上实现了15 ⁇ 的改进。 此外,一些受欢迎的后端域适应算法被安装在演讲者嵌入器上,以缓解语言因隐含的错配而导致的音性能退化。 计分校准是我们SRE21 中的主要失败。 原因是,与过多参数的校准分很容易导致问题过多。