This report describes the NPU-HC speaker verification system submitted to the O-COCOSDA Multi-lingual Speaker Verification (MSV) Challenge 2022, which focuses on developing speaker verification systems for low-resource Asian languages. We participate in the I-MSV track, which aims to develop speaker verification systems for various Indian languages. In this challenge, we first explore different neural network frameworks for low-resource speaker verification. Then we leverage vanilla fine-tuning and weight transfer fine-tuning to transfer the out-domain pre-trained models to the in-domain Indian dataset. Specifically, the weight transfer fine-tuning aims to constrain the distance of the weights between the pre-trained model and the fine-tuned model, which takes advantage of the previously acquired discriminative ability from the large-scale out-domain datasets and avoids catastrophic forgetting and overfitting at the same time. Finally, score fusion is adopted to further improve performance. Together with the above contributions, we obtain 0.223% EER on the public evaluation set, ranking 2nd place on the leaderboard. On the private evaluation set, the EER of our submitted system is 2.123% and 0.630% for the constrained and unconstrained sub-tasks of the I-MSV track, leading to the 1st and 3rd place in the ranking, respectively.
翻译:本报告介绍了提交给O-COCOSDA多语言演讲者核查组织(MSV)2022挑战的NPU-HC演讲者核查系统,重点是开发亚洲低资源语言的演讲者核查系统。我们参加了IMSV轨道,目的是开发印度各种语言的演讲者核查系统。在这项挑战中,我们首先探索用于低资源演讲者核查的不同神经网络框架。然后我们利用香草微调和重量微调,将预先培训的模型转换到印度内部数据集。具体来说,加权微调的目的是限制预先培训的模式和微调模式之间的权重距离。我们参与IMS轨道,利用以前从大规模外部数据集中获得的歧视性能力,避免在同一时间灾难性地遗忘和过度适应。最后,我们采用了分数组合,以进一步提高业绩。加上上述贡献,我们获得了公共评价组的0.223%的EER,在领导板上排名第二。在私人评价组中,EER系统E-ER在大规模外数据组中分别领先2.123 % 和0.630分级,在IMS系统第1和第3级中分别领先于第1级第1级和第1级第3级限制。