Meta-learning (ML) has recently become a research hotspot in speaker verification (SV). We introduce two methods to improve the meta-learning training for SV in this paper. For the first method, a backbone embedding network is first jointly trained with the conventional cross entropy loss and prototypical networks (PN) loss. Then, inspired by speaker adaptive training in speech recognition, additional transformation coefficients are trained with only the PN loss. The transformation coefficients are used to modify the original backbone embedding network in the x-vector extraction process. Furthermore, the random erasing (RE) data augmentation technique is applied to all support samples in each episode to construct positive pairs, and a contrastive loss between the augmented and the original support samples is added to the objective in model training. Experiments are carried out on the Speaker in the Wild (SITW) and VOiCES databases. Both of the methods can obtain consistent improvements over existing meta-learning training frameworks. By combining these two methods, we can observe further improvements on these two databases.
翻译:元学习(ML)最近已成为语音校验(SV)中的一个研究热点。我们采用两种方法改进本文中SV的元学习培训。第一种方法是,先对骨干嵌入网络进行常规交叉环球损失和原型网络损失的联合培训,然后在语音识别方面的演讲者适应性培训的启发下,仅对PN损失进行额外变异系数培训。变异系数用于修改XVC提取过程中最初的骨干嵌入网络。此外,随机删除(RE)数据增强技术适用于每一集中的所有支持样本,以建立正对,并将扩大和原始支持样本之间的对比性损失添加到示范培训的目标中。在野生(SITW)发言人和VoiCES数据库中进行实验。这两种方法都可以对现有的元学习培训框架取得一致的改进。通过结合这两种方法,我们可以看到这两个数据库的进一步改进。