以国际投资框架为基础的非自动递后端对端ASR培训 (Boundary and Context Aware Training for CIF-based Non-Autoregressive End-to-end ASR) - 专知论文

会员服务 ·

0

Conformer · MoDELS · 语音识别 · Continuity · 端到端 ·

2021 年 9 月 26 日

Boundary and Context Aware Training for CIF-based Non-Autoregressive End-to-end ASR

翻译：以国际投资框架为基础的非自动递后端对端ASR培训

Fan Yu,Haoneng Luo,Pengcheng Guo,Yuhao Liang,Zhuoyuan Yao,Lei Xie,Yingying Gao,Leijing Hou,Shilei Zhang

from arxiv, 5 pages,4 figures

Continuous integrate-and-fire (CIF) based models, which use a soft and monotonic alignment mechanism, have been well applied in non-autoregressive (NAR) speech recognition with competitive performance compared with other NAR methods. However, such an alignment learning strategy may suffer from an erroneous acoustic boundary estimation, severely hindering the convergence speed as well as the system performance. In this paper, we propose a boundary and context aware training approach for CIF based NAR models. Firstly, the connectionist temporal classification (CTC) spike information is utilized to guide the learning of acoustic boundaries in the CIF. Besides, an additional contextual decoder is introduced behind the CIF decoder, aiming to capture the linguistic dependencies within a sentence. Finally, we adopt a recently proposed Conformer architecture to improve the capacity of acoustic modeling. Experiments on the open-source Mandarin AISHELL-1 corpus show that the proposed method achieves a comparable character error rates (CERs) of 4.9% with only 1/24 latency compared with a state-of-the-art autoregressive (AR) Conformer model. Futhermore, when evaluating on an internal 7500 hours Mandarin corpus, our model still outperforms other NAR methods and even reaches the AR Conformer model on a challenging real-world noisy test set.

翻译：持续整合和火灾模型(CIF)基于持续整合和火灾模型(CIF)使用软和单调校准机制,在非航空(NAR)语音识别中,与其他NAR方法相比,在竞争性性能上,在非航空(NAR)语音识别中很好地应用了竞争性表现,但是,这种校准学习战略可能因声音边界估计错误而受到影响,严重妨碍了趋同速度和系统性能。在本文件中,我们为基于CIF的NAR模型提出了一个边界和背景意识培训方法。首先,使用连接时间分类(CT)峰值信息来指导CIF的声波边界学习。此外,CIF解码后还引入了额外的背景解码器,目的是在句子内捕捉语言依赖性。最后,我们采用了最近提出的统一结构,以提高声学模型的能力。在开放源的Mandarin ASHELL-1系列实验中显示,拟议方法达到4.9%的可比性差率(CERs),而相对于状态自动反射模式(AR Constold)模型。Fermormor-formagistring a realstalstalstal stall agilling ontostation on romogy set roduction on 7 hard set roduction onstalstal set set rogymal setmal setmet se setmal setdal)。

0

相关内容

Conformer

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【ICLR2020 预训练的百科全书】弱监督的知识-预训练的语言模型（PRETRAINED ENCYCLOPEDIA: WEAKLY SUPERVISED KNOWLEDGE-PRETRAINED LANGUAGE MODEL）

【ICLR2020 预训练的百科全书】弱监督的知识-预训练的语言模型（PRETRAINED ENCYCLOPEDIA: WEAKLY SUPERVISED KNOWLEDGE-PRETRAINED LANGUAGE MODEL）

专知会员服务

25+阅读 · 2019年12月26日

【元学习 | ICASSP2020提交论文】学习低资源语音识别，国立台湾大学 | 李宏毅

【元学习 | ICASSP2020提交论文】学习低资源语音识别，国立台湾大学 | 李宏毅

专知会员服务

57+阅读 · 2019年11月21日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

最强NLP预训练模型库PyTorch-Transformers正式开源！支持6个预训练框架，27个预训练模型

最强NLP预训练模型库PyTorch-Transformers正式开源！支持6个预训练框架，27个预训练模型

AI前线

12+阅读 · 2019年7月22日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

专知

52+阅读 · 2018年6月28日

【论文推荐】最新6篇目标跟踪相关论文—动态记忆网络、相关滤波器、单次学习、相关、循环自回归网络、三维多目标

【论文推荐】最新6篇目标跟踪相关论文—动态记忆网络、相关滤波器、单次学习、相关、循环自回归网络、三维多目标

专知

7+阅读 · 2018年3月21日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Transformer-S2A: Robust and Efficient Speech-to-Animation

Transformer-S2A: Robust and Efficient Speech-to-Animation

Arxiv

0+阅读 · 2021年11月18日

Investigation of Speaker-adaptation methods in Transformer based ASR

Arxiv

0+阅读 · 2021年11月17日

Joint Unsupervised and Supervised Training for Multilingual ASR

Arxiv

0+阅读 · 2021年11月15日

Phase-aware Speech Enhancement with Deep Complex U-Net

Phase-aware Speech Enhancement with Deep Complex U-Net

Arxiv

15+阅读 · 2019年3月7日

Psi-Net: Shape and boundary aware joint multi-task deep network for medical image segmentation

Psi-Net: Shape and boundary aware joint multi-task deep network for medical image segmentation

Arxiv

8+阅读 · 2019年2月11日

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

Arxiv

5+阅读 · 2018年6月4日

Recursive Neural Network Based Preordering for English-to-Japanese Machine Translation

Arxiv

7+阅读 · 2018年5月25日

Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification

Arxiv

5+阅读 · 2018年3月27日

Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model

Arxiv

7+阅读 · 2018年1月23日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

相关VIP内容

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【ICLR2020 预训练的百科全书】弱监督的知识-预训练的语言模型（PRETRAINED ENCYCLOPEDIA: WEAKLY SUPERVISED KNOWLEDGE-PRETRAINED LANGUAGE MODEL）

【ICLR2020 预训练的百科全书】弱监督的知识-预训练的语言模型（PRETRAINED ENCYCLOPEDIA: WEAKLY SUPERVISED KNOWLEDGE-PRETRAINED LANGUAGE MODEL）

专知会员服务

25+阅读 · 2019年12月26日

【元学习 | ICASSP2020提交论文】学习低资源语音识别，国立台湾大学 | 李宏毅

【元学习 | ICASSP2020提交论文】学习低资源语音识别，国立台湾大学 | 李宏毅

专知会员服务

57+阅读 · 2019年11月21日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

数据驱动死亡：以色列AI战争机器如何锁定目标

【普林斯顿博士论文】通过以人为本的评估推动负责任的人工智能

ICML 2025 | BiAssemble: 双臂机器人几何拼合问题的协同可供性学习

ICML 2025杰出论文出炉：8篇获奖，南大研究者榜上有名

相关资讯

最强NLP预训练模型库PyTorch-Transformers正式开源！支持6个预训练框架，27个预训练模型

最强NLP预训练模型库PyTorch-Transformers正式开源！支持6个预训练框架，27个预训练模型

AI前线

12+阅读 · 2019年7月22日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

专知

52+阅读 · 2018年6月28日

【论文推荐】最新6篇目标跟踪相关论文—动态记忆网络、相关滤波器、单次学习、相关、循环自回归网络、三维多目标

【论文推荐】最新6篇目标跟踪相关论文—动态记忆网络、相关滤波器、单次学习、相关、循环自回归网络、三维多目标

专知

7+阅读 · 2018年3月21日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Transformer-S2A: Robust and Efficient Speech-to-Animation

Transformer-S2A: Robust and Efficient Speech-to-Animation

Arxiv

0+阅读 · 2021年11月18日

Investigation of Speaker-adaptation methods in Transformer based ASR

Arxiv

0+阅读 · 2021年11月17日

Joint Unsupervised and Supervised Training for Multilingual ASR

Arxiv

0+阅读 · 2021年11月15日

Phase-aware Speech Enhancement with Deep Complex U-Net

Phase-aware Speech Enhancement with Deep Complex U-Net

Arxiv

15+阅读 · 2019年3月7日

Psi-Net: Shape and boundary aware joint multi-task deep network for medical image segmentation

Psi-Net: Shape and boundary aware joint multi-task deep network for medical image segmentation

Arxiv

8+阅读 · 2019年2月11日

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

Arxiv

5+阅读 · 2018年6月4日

Recursive Neural Network Based Preordering for English-to-Japanese Machine Translation

Arxiv

7+阅读 · 2018年5月25日

Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification

Arxiv

5+阅读 · 2018年3月27日

Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model

Arxiv

7+阅读 · 2018年1月23日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

微信扫码咨询专知VIP会员