In this paper, we propose a novel method that trains pass-phrase specific deep neural network (PP-DNN) based auto-encoders for creating augmented data for text-dependent speaker verification (TD-SV). Each PP-DNN auto-encoder is trained using the utterances of a particular pass-phrase available in the target enrollment set with two methods: (i) transfer learning and (ii) training from scratch. Next, feature vectors of a given utterance are fed to the PP-DNNs and the output from each PP-DNN at frame-level is considered one new set of generated data. The generated data from each PP-DNN is then used for building a TD-SV system in contrast to the conventional method that considers only the evaluation data available. The proposed approach can be considered as the transformation of data to the pass-phrase specific space using a non-linear transformation learned by each PP-DNN. The method develops several TD-SV systems with the number equal to the number of PP-DNNs separately trained for each pass-phrases for the evaluation. Finally, the scores of the different TD-SV systems are fused for decision making. Experiments are conducted on the RedDots challenge 2016 database for TD-SV using short utterances. Results show that the proposed method improves the performance for both conventional cepstral feature and deep bottleneck feature using both Gaussian mixture model - universal background model (GMM-UBM) and i-vector framework.
翻译:在本文中,我们提出了一个新颖的方法,用于培训口语特定深层神经网络(PP-DNNN)基于自动编码器,为基于文本的语音校验(TD-SV)创建强化数据。每个PP-DNN自动编码器都使用目标招生套件中可用的特定口语的语句表达方式进行培训,使用两种方法:(一) 转移学习和(二) 从零开始培训。接着,向PPP-DNNN提供特定语句的特性矢量,将每个PP-DNNN在框架一级的输出视为一套新的生成数据。然后,每个PP-DNNN的生成数据被用于构建一个仅考虑现有评价数据的传统方法的TD-SV系统。最后,拟议的方法可以被视为利用每个PPP-D-DNNNN所学所学的非线性转换数据转换到的密码。 方法开发了几个TD-S-DNNNNP在每套过一个经过单独培训的背景语系内生成的数据。最后,在使用常规G结果数据库中采用不同的标准格式,在D-D-D-M 上采用不同的标准格式,在DVD-C-M 上采用不同的标准格式上,在D-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-SLVLVL-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-