With computers getting more and more powerful and integrated in our daily lives, the focus is increasingly shifting towards more human-friendly interfaces, making Automatic Speech Recognition (ASR) a central player as the ideal means of interaction with machines. Consequently, interest in speech technology has grown in the last few years, with more systems being proposed and higher accuracy levels being achieved, even surpassing \textit{Human Accuracy}. While ASR systems become increasingly powerful, the computational complexity also increases, and the hardware support have to keep pace. In this paper, we propose a technique to improve the energy-efficiency and performance of ASR systems, focusing on low-power hardware for edge devices. We focus on optimizing the DNN-based Acoustic Model evaluation, as we have observed it to be the main bottleneck in state-of-the-art ASR systems, by leveraging run-time information from the Beam Search. By doing so, we reduce energy and execution time of the acoustic model evaluation by 25.6% and 25.9%, respectively, with negligible accuracy loss.
翻译:随着计算机在日常生活中越来越强大和一体化,重点正日益转向更方便人的界面,使自动语音识别(ASR)成为与机器互动的理想手段,因此,过去几年对语音技术的兴趣有所增长,提出了更多的系统,并实现了更高的准确度,甚至超过了textit{人类准确度}。虽然ASR系统越来越强大,但计算复杂性也在增加,硬件支持必须跟上步伐。在本文中,我们提出了一个提高ASR系统能效和性能的技术,重点是边缘装置的低功率硬件。我们注重优化基于DNNN的声学模型评估,我们观察到这是最新ASR系统的主要瓶颈,通过利用Baam搜索的运行时间信息。我们这样做,将声学模型评估的能量和执行时间分别减少25.6%和25.9%,而精确率损失微乎其微。