This paper presents a novel study of parameter-free attentive scoring for speaker verification. Parameter-free scoring provides the flexibility of comparing speaker representations without the need of an accompanying parametric scoring model. Inspired by the attention component in Transformer neural networks, we propose a variant of the scaled dot product attention mechanism to compare enrollment and test segment representations. In addition, this work explores the effect on performance of (i) different types of normalization, (ii) independent versus tied query/key estimation, (iii) varying the number of key-value pairs and (iv) pooling multiple enrollment utterance statistics. Experimental results for a 4 task average show that a simple parameter-free attentive scoring mechanism can improve the average EER by 10% over the best cosine similarity baseline.
翻译:本文介绍了一项关于无参数仔细评分供演讲人校验的新研究。无参数评分为比较演讲人表情提供了灵活性,无需附带一个参数评分模型。在变换神经网络的注意部分的启发下,我们提出了一个标准点产品注意机制的变体,以比较招生和测试区段表示情况。此外,这项工作还探讨了以下因素对表现的影响:(一) 不同类型的正常化,(二) 独立和捆绑的询问/钥匙估计,(三) 不同的关键值对对数,(四) 汇集多个指数统计。4项任务平均实验结果显示,一个简单的无参数专注评分机制可以使平均EER提高10%,超过最佳焦类似基线。