利用经过预先培训的音音和语言模式,改进对非北向端至端语音识别 (Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models)

While Transformers have achieved promising results in end-to-end (E2E) automatic speech recognition (ASR), their autoregressive (AR) structure becomes a bottleneck for speeding up the decoding process. For real-world deployment, ASR systems are desired to be highly accurate while achieving fast inference. Non-autoregressive (NAR) models have become a popular alternative due to their fast inference speed, but they still fall behind AR systems in recognition accuracy. To fulfill the two demands, in this paper, we propose a NAR CTC/attention model utilizing both pre-trained acoustic and language models: wav2vec2.0 and BERT. To bridge the modality gap between speech and text representations obtained from the pre-trained models, we design a novel modality conversion mechanism, which is more suitable for logographic languages. During inference, we employ a CTC branch to generate a target length, which enables the BERT to predict tokens in parallel. We also design a cache-based CTC/attention joint decoding method to improve the recognition accuracy while keeping the decoding speed fast. Experimental results show that the proposed NAR model greatly outperforms our strong wav2vec2.0 CTC baseline (15.1% relative CER reduction on AISHELL-1). The proposed NAR model significantly surpasses previous NAR systems on the AISHELL-1 benchmark and shows a potential for English tasks.

翻译：虽然转型者在端对端自动语音识别(E2E)自动语音识别(ASR)方面取得了可喜的成果,但其自动递减(AR)结构已成为加速解码进程的瓶颈。对于真实世界的部署,ASR系统期望在达到快速推断的同时高度准确。非自动递减(NAR)模式因其快速发回速度而成为一种受欢迎的替代方案,但它们仍然落后于AR系统的确认准确性。为了满足上述两项要求,我们在本文件中提出一个NAR CTC/ 注意模式,利用预先培训的声学和语言模型: wav2vec2.0 和BERT。为了缩小从预培训模式中获得的言语和文字表达方式之间的差距,我们设计了一个新颖的模式转换机制,更适合逻辑语言。在推断过程中,我们使用一个CTCS(NER)分支来生成一个目标长度,使BERT能够同时预测符号。我们还设计了一个基于缓存的CT/保持联合解码方法,以提高识别准确性,同时保持快速解码速度。ARCR1 实验结果显示我们提议的CRA1 大幅削减前一个基准值。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日