The paper presents a novel approach to refining similarity scores between input utterances for robust speaker verification. Given the embeddings from a pair of input utterances, a graph model is designed to incorporate additional information from a group of embeddings representing the so-called auxiliary speakers. The relations between the input utterances and the auxiliary speakers are represented by the edges and vertices in the graph. The similarity scores are refined by iteratively updating the values of the graph's vertices using an algorithm similar to the random walk algorithm on graphs. Through this updating process, the information of auxiliary speakers is involved in determining the relation between input utterances and hence contributing to the verification process. We propose to create a set of artificial embeddings through the model training process. Utilizing the generated embeddings as auxiliary speakers, no extra data are required for the graph model in the verification stage. The proposed model is trained in an end-to-end manner within the whole system. Experiments are carried out with the Voxceleb datasets. The results indicate that involving auxiliary speakers with graph is effective to improve speaker verification performance.
翻译:本文展示了一种新颖的方法来改进用于稳健扬声器校验的输入语句之间的相似分数。 根据一对输入语句的嵌入,设计了一个图形模型,以纳入一组嵌入器中代表所谓的辅助演讲者的额外信息。 输入语句和辅助演讲者之间的关系由图形的边缘和脊椎代表。 类似的分数通过使用类似于图中随机行走算法的算法来迭代更新图头的值来改进。 通过这一更新过程,辅助演讲者的信息涉及确定输入语句之间的关系,从而对核查进程作出贡献。 我们提议通过模型培训过程创建一套人工嵌入器。 将生成的嵌入器用作辅助演讲者, 不需要为图形模型的核查阶段提供额外数据。 拟议的模型在整个系统中以端对端方式进行训练。 与Voxceleb数据集一起进行实验。 结果表明,使用图表辅助演讲者参与是有效的改进演讲者核查效果。