In this letter, we propose a vocal tract length (VTL) perturbation method for text-dependent speaker verification (TD-SV), in which a set of TD-SV systems are trained, one for each VTL factor, and score-level fusion is applied to make a final decision. Next, we explore the bottleneck (BN) feature extracted by training deep neural networks with a self-supervised objective, autoregressive predictive coding (APC), for TD-SV and compare it with the well-studied speaker-discriminant BN feature. The proposed VTL method is then applied to APC and speaker-discriminant BN features. In the end, we combine the VTL perturbation systems trained on MFCC and the two BN features in the score domain. Experiments are performed on the RedDots challenge 2016 database of TD-SV using short utterances with Gaussian mixture model-universal background model and i-vector techniques. Results show the proposed methods significantly outperform the baselines.
翻译:在这封信中,我们提议为TD-SV提供一种依赖文字的发言者校验(TD-SV)的声带扰动法,其中一套TD-SV系统经过培训,每个VTL系数各一个,并应用分级混凝法作出最后决定。接下来,我们探索通过培训深神经网络提取的瓶颈(BN)特征,其目标由自我监督,自动递增预测编码(CPC),用于TD-SV, 并将其与经过广泛研究的发言者分辨BN特征进行比较。然后,拟议的VTL方法适用于APC和语言分辨BN特征。在最后,我们结合了在MFCC上受过培训的VTL穿撞系统和分域的两个BN特征。在RedDots 2016年的TD-SV挑战数据库上进行了实验,使用高斯混合模型-通用背景模型和i-V技术的短音调调。结果显示拟议的方法大大超越了基线。