The outstanding accuracy achieved by modern Automatic Speech Recognition (ASR) systems is enabling them to quickly become a mainstream technology. ASR is essential for many applications, such as speech-based assistants, dictation systems and real-time language translation. However, highly accurate ASR systems are computationally expensive, requiring on the order of billions of arithmetic operations to decode each second of audio, which conflicts with a growing interest in deploying ASR on edge devices. On these devices, hardware acceleration is key for achieving acceptable performance. However, ASR is a rich and fast-changing field, and thus, any overly specialized hardware accelerator may quickly become obsolete. In this paper, we tackle those challenges by proposing ASRPU, a programmable accelerator for on-edge ASR. ASRPU contains a pool of general-purpose cores that execute small pieces of parallel code. Each of these programs computes one part of the overall decoder (e.g. a layer in a neural network). The accelerator automates some carefully chosen parts of the decoder to simplify the programming without sacrificing generality. We provide an analysis of a modern ASR system implemented on ASRPU and show that this architecture can achieve real-time decoding with a very low power budget.
翻译:现代自动语音识别(ASR)系统所实现的杰出精度使得它们能够迅速成为主流技术。ASR对于许多应用,例如语音辅助、听写系统和实时语言翻译等,至关重要。然而,非常精确的ASR系统计算成本高昂,需要数十亿个计算操作来解码每秒音频,这与在边缘装置上部署自动语音识别(ASR)的兴趣日益浓厚相冲突。在这些装置上,硬件加速是实现可接受性能的关键。但是,ASR是一个丰富和快速变化的场域,因此,任何过于专业化的硬件加速器都可能很快过时。在本文中,我们提出ASRPU,即一个可编程的ASR加速器。ASRPU包含一组通用核心,用来执行小的平行代码。每个程序都计算了整体解密器的一部分(例如神经网络中的一层层),加速器自动化器自动化器,一些经过仔细选择的解码器部件可以迅速过时。我们在本文中提出挑战,不牺牲通用的ASR系统。我们用现代的动力来分析一个现代化的ASR系统,从而显示一个现代化的预算编制。