In this report, we describe the Beijing ZKJ-NPU team submission to the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21). We participated in the fully supervised speaker verification track 1 and track 2. In the challenge, we explored various kinds of advanced neural network structures with different pooling layers and objective loss functions. In addition, we introduced the ResNet-DTCF, CoAtNet and PyConv networks to advance the performance of CNN-based speaker embedding model. Moreover, we applied embedding normalization and score normalization at the evaluation stage. By fusing 11 and 14 systems, our final best performances (minDCF/EER) on the evaluation trails are 0.1205/2.8160% and 0.1175/2.8400% respectively for track 1 and 2. With our submission, we came to the second place in the challenge for both tracks.
翻译:在本报告中,我们介绍了北京ZKJ-NPU团队向VoxCeleb发言人承认挑战2021(VoxSRC-21)提交的北京ZKJ-NPU团队报告,我们参加了全面监督的演讲者核查轨道1和轨道2,在这一挑战中,我们探索了具有不同集合层和客观损失功能的各种先进的神经网络结构,此外,我们介绍了ResNet-DTCF、CoAtNet和PyConv网络,以推进CNN演讲者嵌入模型的性能。此外,我们在评价阶段运用了将正常化和得分正常化的做法。我们运用了11和14个系统,在评价轨迹上的最后成绩(minDCF/EER)分别为0.1205/2.8160 % 和0.175/2.8400 %, 在我们的呈件中,我们在两个轨道上都排第二位。