The increasing amount of available data and more affordable hardware solutions have opened a gate to the realm of Deep Learning (DL). Due to the rapid advancements and ever-growing popularity of DL, it has begun to invade almost every field, where machine learning is applicable, by altering the traditional state-of-the-art methods. While many researchers in the speaker recognition area have also started to replace the former state-of-the-art methods with DL techniques, some of the traditional i-vector-based methods are still state-of-the-art in the context of text-independent speaker verification (TI-SV). In this paper, we discuss the most recent generalized end-to-end (GE2E) DL technique based on Long Short-term Memory (LSTM) units for TI-SV by Google and compare different scenarios and aspects including utterance duration, training time, and accuracy to prove that our method outperforms the traditional methods.
翻译:越来越多的可用数据和更廉价的硬件解决方案打开了深入学习的大门。 由于快速的进步和日益流行的DL,它开始通过改变传统的最先进方法,侵入几乎所有适用机器学习的领域,改变传统的最先进方法。虽然在语音识别区的许多研究人员也开始用DL技术取代以前的最先进方法,但一些传统的i-Ver-基础方法在依赖文字的发言者核查(TI-SV)方面仍然是最先进的方法。 在本文件中,我们讨论了基于谷歌为TI-SV建立的长期短期内存(LSTM)单元的最近最普遍的端对端(GE2E)DL技术,并比较了不同的情景和方面,包括语音持续时间、培训时间和准确性,以证明我们的方法超越了传统方法。