Automatic Speech Recognition (ASR) has increased in popularity in recent years. The evolution of processor and storage technologies has enabled more advanced ASR mechanisms, fueling the development of virtual assistants such as Amazon Alexa, Apple Siri, Microsoft Cortana, and Google Home. The interest in such assistants, in turn, has amplified the novel developments in ASR research. However, despite this popularity, there has not been a detailed training efficiency analysis of modern ASR systems. This mainly stems from: the proprietary nature of many modern applications that depend on ASR, like the ones listed above; the relatively expensive co-processor hardware that is used to accelerate ASR by big vendors to enable such applications; and the absence of well-established benchmarks. The goal of this paper is to address the latter two of these challenges. The paper first describes an ASR model, based on a deep neural network inspired by recent work in this domain, and our experiences building it. Then we evaluate this model on three CPU-GPU co-processor platforms that represent different budget categories. Our results demonstrate that utilizing hardware acceleration yields good results even without high-end equipment. While the most expensive platform (10X price of the least expensive one) converges to the initial accuracy target 10-30% and 60-70% faster than the other two, the differences among the platforms almost disappear at slightly higher accuracy targets. In addition, our results further highlight both the difficulty of evaluating ASR systems due to the complex, long, and resource intensive nature of the model training in this domain, and the importance of establishing benchmarks for ASR.
翻译:近年来,自动语音识别(ASR)越来越受欢迎。处理器和存储技术的演进使得更先进的处理器和存储技术能够建立更先进的自动识别机制,从而推动亚马逊亚历山大、苹果Siri、微软Cortana和谷歌之家等虚拟助手的发展。对这些助手的兴趣反过来又扩大了ASR研究的新发展。然而,尽管受到欢迎,现代ASR系统尚未进行详细的培训效率分析。这主要是因为:许多现代应用软件的专利性质取决于ASR,如上文所列的那些软件;用于大型供应商加速ASR应用的相对昂贵的共同处理器硬件;以及缺乏完善的基准。本文的目的是解决后两个挑战。文件首先描述了ASR模型,基于最近在这一领域的工作所启发的深度神经网络,以及我们的经验。然后我们用三个CPU-GPU联合处理平台来评估这一模型,这些平台代表了不同的预算类别。我们的成果表明,在使用硬件加速处理器时,即使没有高端的ASR重要性,也取得了良好的结果;在最初的60 %的平台上,最昂贵的模型又比其他10 %的标准要高的一个标准要高。