In this paper, we propose a simple but effective method to decode the output of Connectionist Temporal Classifier (CTC) model using a bi-directional neural language model. The bidirectional language model uses the future as well as the past information in order to predict the next output in the sequence. The proposed method based on bi-directional beam search takes advantage of the CTC greedy decoding output to represent the noisy future information. Experiments on the Librispeechdataset demonstrate the superiority of our proposed method compared to baselines using unidirectional decoding. In particular, the boost inaccuracy is most apparent at the start of a sequence which is the most erroneous part for existing systems based on unidirectional decoding.
翻译:在本文中,我们提出了一个简单而有效的方法,用双向神经神经语言模型解码连接时间分类(CTC)模型的输出。双向语言模型使用未来和过去的信息来预测序列中的下一个输出。基于双向波束搜索的拟议方法利用CTC贪婪解码输出来代表吵闹的未来信息。Librispeechdatas的实验表明,我们拟议方法优于使用单向解码的基线。特别是,在以单向解码为根据的现有系统最错误的部分的序列开始时,推进不准确最为明显。