The availability of open-source software is playing a remarkable role in automatic speech recognition (ASR). Kaldi, for instance, is widely used to develop state-of-the-art offline and online ASR systems. This paper describes the "ExKaldi-RT," online ASR toolkit implemented based on Kaldi and Python language. ExKaldi-RT provides tools for providing a real-time audio stream pipeline, extracting acoustic features, transmitting packets with a remote connection, estimating acoustic probabilities with a neural network, and online decoding. While similar functions are available built on Kaldi, a key feature of ExKaldi-RT is completely working on Python language, which has an easy-to-use interface for online ASR system developers to exploit original research, for example, by applying neural network-based signal processing and acoustic model trained with deep learning frameworks. We performed benchmark experiments on the minimum LibriSpeech corpus, and showed that ExKaldi-RT could achieve competitive ASR performance in real-time.
翻译:在自动语音识别(ASR)方面,开放源码软件的可用性正在发挥显著作用。例如,Kaldi被广泛用于开发最新的离线和在线ASR系统。本文描述了基于Kaldi和Python语言实施的在线ASR工具包“ExKaldi-RT”。ExKaldi-RT提供了提供实时音流管道的工具,提取声学功能,传输远程连接的包,估计神经网络的声频概率,在线解码。虽然Kaldi也有类似的功能,但ExKaldi-RT的关键特征是完全在Python语言上工作,该功能为在线ASR系统开发者提供了一个方便使用的界面,以便利用原始研究,例如,通过应用以神经网络为基础的信号处理和经过深层学习框架培训的声学模型。我们进行了关于最小LibriSpeechpory的基准实验,并表明ExKaldi-RT可以在实时实现竞争性的ASR性工作。