具有多重变异功能的自动扩音器校验后端注意后端 (Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances)

A back-end model is a key element of modern speaker verification systems. Probabilistic linear discriminant analysis (PLDA) has been widely used as a back-end model in speaker verification. However, it cannot fully make use of multiple utterances from enrollment speakers. In this paper, we propose a novel attention-based back-end model, which can be used for both text-independent (TI) and text-dependent (TD) speaker verification with multiple enrollment utterances, and employ scaled-dot self-attention and feed-forward self-attention networks as architectures that learn the intra-relationships of the enrollment utterances. In order to verify the proposed attention back-end, we combine it with two completely different but dominant speaker encoders, which are time delay neural network (TDNN) and ResNet trained using the additive-margin-based softmax loss and the uniform loss, and compare them with the conventional PLDA or cosine scoring approaches. Experimental results on a multi-genre dataset called CN-Celeb show that the performance of our proposed approach outperforms PLDA scoring with TDNN and cosine scoring with ResNet by around 14.1% and 7.8% in relative EER, respectively. Additionally, an ablation experiment is also reported in this paper for examining the impact of some significant hyper-parameters for the proposed back-end model.

翻译：后端模型是现代扬声器核查制度的一个关键要素。概率线性线性分辨分析( PDA) 已被广泛用作发言者核查的后端模型。但是, 它不能完全使用来自使用用户演讲者的多个话词。在本文中, 我们提出一个新的基于关注的后端模型, 可用于文本独立的(TI) 和文本独立的(TD) 语音(TD) 演示, 可用于基于文本独立的(TTI) 和文本独立的(TD) 后端(T) 演示, 并使用多个出入口的语句话话语调核查, 并使用大规模点自控自控和反馈自向后自控网络, 用作学习入口语句的后端模型。为了核实拟议的后端注意后端,我们无法充分利用来自上层演讲者的多端话。我们与两个完全不同但占主导地位的演讲者(即时间延迟神经神经神经网络)和ResNet, 使用基于添加- margin 软式软式损失和统一损失来加以比较,并将它们与常规的PLDDDDA或coine评分等评方法进行比较比较。称为CN- Celeb的多gen数据集实验结果, 实验结果结果结果结果显示我们报告的拟议方法的、的实验性方法的实验性方法的实验性方法的实验性方法在ER- 和ER-,在ER- 和ER- ER- 的实验性办法在ER- ER- 的实验性网络- 的实验性办法在ER- 和TD- 的实验性办法的实验性办法的实验性-,在ER- 和TD- 的实验性-,在SDDDDDDM 和TDN-, 10 的SDN- 和TD- 和TD- 的的的的的实验性 10 的的的的的的的的的的的的的的的的的和TD 的的的的的的的的的的的的的的的的的的的的的的的的的的的的