This technical report describes the IDLab submission for track 1 and 2 of the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21). This speaker verification competition focuses on short duration test recordings and cross-lingual trials. Currently, both Time Delay Neural Networks (TDNNs) and ResNets achieve state-of-the-art results in speaker verification. We opt to use a system fusion of hybrid architectures in our final submission. An ECAPA-TDNN baseline is enhanced with a 2D convolutional stem to transfer some of the strong characteristics of a ResNet based model to this hybrid CNN-TDNN architecture. Similarly, we incorporate absolute frequency positional information in the SE-ResNet architectures. All models are trained with a special mini-batch data sampling technique which constructs mini-batches with data that is the most challenging for the system on the level of intra-speaker variability. This intra-speaker variability is mainly caused by differences in language and background conditions between the speaker's utterances. The cross-lingual effects on the speaker verification scores are further compensated by introducing a binary cross-linguality trial feature in the logistic regression based system calibration. The final system fusion with two ECAPA CNN-TDNNs and three SE-ResNets enhanced with frequency positional information achieved a third place on the VoxSRC-21 leaderboard for both track 1 and 2 with a minDCF of 0.1291 and 0.1313 respectively.
翻译:这份技术报告描述了VoxCeleb议长承认挑战2021年第1和第2轨(VoxSRSRC-21)的IDLab提交文件。这位发言者的核查竞争侧重于短时间测试记录和跨语言审判。目前,时间延迟神经网络和ResNets在语音核查方面都取得了最先进的成果。我们选择在最后提交的文件中使用混合结构的系统组合。ECAPA-TDNN的基线以2D连动干柱将基于ResNet模型的一些强项特性转移到CNN-TDNN的混合结构中。同样,我们把绝对频率定位信息纳入SE-ResNet结构中。所有模型都经过特别的小型批量数据取样技术培训,该技术在语音网络内部变异程度方面对系统最有挑战性的数据构成。这种内部变异性主要是由于发言者在语言和背景条件上的差异,将基于CNNNNW-TD-TDNNN的演示分数转换成一个双频跨频率的S-BRR3 和SIRCS-RISBS-RLVS-BLLLA 3 和SIS-RISLILLBS-RBS-S-S-S-RBS-S-S-S-RBLID-SLBS-BS-S-S-S-S-S-SLI-S-S-S-S-S-S-SLIG-S-S-S-S-S-S-LF-S-S-S-S-S-S-S-S-LF-L-L-L-L-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-L-L-L-L-L-L-L-L-L-L-L-S-SL-S-L-S-S-S-S-S-S-S-S-L-S-S-L-L-L-L-L-S-S-S-S-S-S-L-S-S-L-S-S-L-L-L-L-S-L-