In this work, we introduce metric learning (ML) to enhance the deep embedding learning for text-independent speaker verification (SV). Specifically, the deep speaker embedding network is trained with conventional cross entropy loss and auxiliary pair-based ML loss function. For the auxiliary ML task, training samples of a mini-batch are first arranged into pairs, then positive and negative pairs are selected and weighted through their own and relative similarities, and finally the auxiliary ML loss is calculated by the similarity of the selected pairs. To evaluate the proposed method, we conduct experiments on the Speaker in the Wild (SITW) dataset. The results demonstrate the effectiveness of the proposed method.
翻译:本文介绍了度量学习(Metric Learning,ML)如何增强深度嵌入学习以用于文本无关说话人验证(Text-Independent Speaker Verification,SV)。具体来说,我们训练了一个深度说话者嵌入网络,其中包含传统的交叉熵损失和辅助的基于配对的度量学习损失函数。对于辅助的度量学习任务,一组训练样本首先被分成不同的配对,然后选取正负样本对并根据它们的相似度计算权重,并最终通过选取的样本对的相似度计算辅助的度量学习损失。我们在Speaker in the Wild(SITW)数据集上进行了实验以评估我们的方法,结果表明了我们所提出的方法的有效性。