The scarcity of labeled far-field speech is a constraint for training superior far-field speaker verification systems. Fine-tuning the model pre-trained on large-scale near-field speech substantially outperforms training from scratch. However, the fine-tuning method suffers from two limitations--catastrophic forgetting and overfitting. In this paper, we propose a weight transfer regularization(WTR) loss to constrain the distance of the weights between the pre-trained model with large-scale near-field speech and the fine-tuned model through a small number of far-field speech. With the WTR loss, the fine-tuning process takes advantage of the previously acquired discriminative ability from the large-scale near-field speech without catastrophic forgetting. Meanwhile, we use the PAC-Bayes generalization theory to analyze the generalization bound of the fine-tuned model with the WTR loss. The analysis result indicates that the WTR term makes the fine-tuned model have a tighter generalization upper bound. Moreover, we explore three kinds of norm distance for weight transfer, which are L1-norm distance, L2-norm distance and Max-norm distance. Finally, we evaluate the effectiveness of the WTR loss on VoxCeleb (pre-trained dataset) and FFSVC (fine-tuned dataset) datasets.
翻译:标记的远方语言的稀缺性是培训高超远方演讲者核查制度的一个制约因素。微调在大规模近地演讲中预先训练的模型,从头到尾大大优于培训。然而,微调方法有两种局限性——灾难性的遗忘和过度装配。在本文件中,我们提议了重力转移规范(WTR)损失,以限制预先训练模型与大规模近地演讲和微调模型之间的权重距离,通过少量远地演讲进行微调。随着WTR损失,微调过程利用了以前从大规模近地演讲中获得的区别性能力,而没有灾难性的忘记。与此同时,我们使用PAC-Bayes一般化理论来分析微调模型与WTR损失的通用性约束。分析结果表明,WTR术语使精调模型有一个较严格的概括性上限。此外,我们探索了三种重量转移的规范距离,即L1-中空距离、L2-诺伦特尔姆距离和Max-诺姆距离数据(我们评估了WFS-TRS-FS-FS-TR-C-S-RS-S-D-D-D-D-D-D-D-D-Lest-D-D-FS-FS-FS-FS-FS-FS-FS-FS-S-FS-S-S-S-S-S-FS-FS-FS-FS-S-S-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-D-</s>