Continuous integrate-and-fire (CIF) based models, which use a soft and monotonic alignment mechanism, have been well applied in non-autoregressive (NAR) speech recognition with competitive performance compared with other NAR methods. However, such an alignment learning strategy may suffer from an erroneous acoustic boundary estimation, severely hindering the convergence speed as well as the system performance. In this paper, we propose a boundary and context aware training approach for CIF based NAR models. Firstly, the connectionist temporal classification (CTC) spike information is utilized to guide the learning of acoustic boundaries in the CIF. Besides, an additional contextual decoder is introduced behind the CIF decoder, aiming to capture the linguistic dependencies within a sentence. Finally, we adopt a recently proposed Conformer architecture to improve the capacity of acoustic modeling. Experiments on the open-source Mandarin AISHELL-1 corpus show that the proposed method achieves a comparable character error rates (CERs) of 4.9% with only 1/24 latency compared with a state-of-the-art autoregressive (AR) Conformer model. Futhermore, when evaluating on an internal 7500 hours Mandarin corpus, our model still outperforms other NAR methods and even reaches the AR Conformer model on a challenging real-world noisy test set.


翻译:持续整合和火灾模型(CIF)基于持续整合和火灾模型(CIF)使用软和单调校准机制,在非航空(NAR)语音识别中,与其他NAR方法相比,在竞争性性能上,在非航空(NAR)语音识别中很好地应用了竞争性表现,但是,这种校准学习战略可能因声音边界估计错误而受到影响,严重妨碍了趋同速度和系统性能。在本文件中,我们为基于CIF的NAR模型提出了一个边界和背景意识培训方法。首先,使用连接时间分类(CT)峰值信息来指导CIF的声波边界学习。此外,CIF解码后还引入了额外的背景解码器,目的是在句子内捕捉语言依赖性。最后,我们采用了最近提出的统一结构,以提高声学模型的能力。在开放源的Mandarin ASHELL-1系列实验中显示,拟议方法达到4.9%的可比性差率(CERs),而相对于状态自动反射模式(AR Constold)模型。Fermormor-formagistring a realstalstalstal stall agilling ontostation on romogy set roduction on 7 hard set roduction onstalstal set set rogymal setmal setmet se setmal setdal)。

0
下载
关闭预览

相关内容

专知会员服务
60+阅读 · 2020年3月19日
Stabilizing Transformers for Reinforcement Learning
专知会员服务
59+阅读 · 2019年10月17日
强化学习三篇论文 避免遗忘等
CreateAMind
19+阅读 · 2019年5月24日
Hierarchically Structured Meta-learning
CreateAMind
26+阅读 · 2019年5月22日
Transferring Knowledge across Learning Processes
CreateAMind
28+阅读 · 2019年5月18日
强化学习的Unsupervised Meta-Learning
CreateAMind
17+阅读 · 2019年1月7日
无监督元学习表示学习
CreateAMind
27+阅读 · 2019年1月4日
强化学习 cartpole_a3c
CreateAMind
9+阅读 · 2017年7月21日
Phase-aware Speech Enhancement with Deep Complex U-Net
VIP会员
相关VIP内容
专知会员服务
60+阅读 · 2020年3月19日
Stabilizing Transformers for Reinforcement Learning
专知会员服务
59+阅读 · 2019年10月17日
Top
微信扫码咨询专知VIP会员