PM-MMUT:利用多模量单位培训促进电话主数据增强,以进行语音-降压-滚动E2E语音识别 (PM-MMUT: Boosted Phone-Mask Data Augmentation using Multi-Modeling Unit Training for Phonetic-Reduction-Robust E2E Speech Recognition) - 专知论文

会员服务 ·

0

语音识别 · Boosting（一种模型训练加速方式） · E2E · Learning · Performer ·

2022 年 7 月 3 日

PM-MMUT: Boosted Phone-Mask Data Augmentation using Multi-Modeling Unit Training for Phonetic-Reduction-Robust E2E Speech Recognition

翻译：PM-MMUT:利用多模量单位培训促进电话主数据增强,以进行语音-降压-滚动E2E语音识别

Guodong Ma,Pengfei Hu,Nurmemet Yolwas,Shen Huang,Hao Huang

from arxiv, Accepted to INTERSPEECH 2022

Consonant and vowel reduction are often encountered in speech, which might cause performance degradation in automatic speech recognition (ASR). Our recently proposed learning strategy based on masking, Phone Masking Training (PMT), alleviates the impact of such phenomenon in Uyghur ASR. Although PMT achieves remarkably improvements, there still exists room for further gains due to the granularity mismatch between the masking unit of PMT (phoneme) and the modeling unit (word-piece). To boost the performance of PMT, we propose multi-modeling unit training (MMUT) architecture fusion with PMT (PM-MMUT). The idea of MMUT framework is to split the Encoder into two parts including acoustic feature sequences to phoneme-level representation (AF-to-PLR) and phoneme-level representation to word-piece-level representation (PLR-to-WPLR). It allows AF-to-PLR to be optimized by an intermediate phoneme-based CTC loss to learn the rich phoneme-level context information brought by PMT. Experimental results on Uyghur ASR show that the proposed approaches outperform obviously the pure PMT. We also conduct experiments on the 960-hour Librispeech benchmark using ESPnet1, which achieves about 10% relative WER reduction on all the test set without LM fusion comparing with the latest official ESPnet1 pre-trained model.

翻译：我们最近提出的基于掩罩、电话遮护培训(PMT)的学习战略,减轻了这种现象在Uyghhur ASR中的影响。尽管PMT取得了显著的改进,但由于PMT(PMT)的遮掩单位和模型单位(字机)之间的遮掩单位(字机)之间的颗粒性不匹配,仍有进一步收益的余地。为了提高PMT的性能,我们提议采用多模模模模示范单位培训(MMMUUUT)结构与PMMT(MMM-MMUUT)结合(MMM-MMUT),这可能导致自动掩罩(PMTMUT)最近提出的学习战略,减轻了Uyghur、电话级代表(AF到-PLRRR)和电话级代表(PLLLL-1-WPLLLR)之间的表面代表(PLM-PLM),这让AF-PLR(F-PLRRR)之间的中间基于示范官方损失,以便学习PMTMTMT(PMTMT)带来的丰富手机最新实验结果实验结果结果结果,也明显减少了RLLLBB、SIM标准、SLLLLLLMLMLMLB的所有拟议的10基准,我们基准的削减办法,我们基准,我们基准基准的10(WB),这在10的削减方法,我们基准中明显BBBBBBBBBBBB),这在10的10的10的削减中明显BBBBBBBBBBB的10的10的10,它也明显中,它也使我們的10的10的10的10的10的BBBBBBBBBBBBBBBBBBBBBBBBBB,也明显,也直BBBBBBBBBBBBBBBBBBBBB的A,它也使我們的A,它的T的BBBBBBBBBBBA,它的BBBBBBA,它的S,它也使我們的10的B的B的SBBBB的B的B的B的B的B的

0

相关内容

语音识别

语音识别是计算机科学和计算语言学的一个跨学科子领域，它发展了一些方法和技术，使计算机可以将口语识别和翻译成文本。它也被称为自动语音识别（ASR），计算机语音识别或语音转文本（STT）。它整合了计算机科学，语言学和计算机工程领域的知识和研究。

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

专知会员服务

136+阅读 · 2020年3月8日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

活性气体分子硫化氢对RNASET2介导的黑素细胞损伤保护作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Cav1.2钙通道的镁离子依赖性易化与失活作用的新机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

VP22-Hoxa3修饰糖尿病源性Gr-1+CD11b+髓样细胞治疗缺血性心脏病的实验研究

国家自然科学基金

0+阅读 · 2014年12月31日

MRKβ与p-MSK1介导NFκB信号通路在神经炎症引起的神经元损伤中的作用及其机制

国家自然科学基金

0+阅读 · 2013年12月31日

MoS2纳米结构析氢活性的机理研究及优化

国家自然科学基金

0+阅读 · 2013年12月31日

数据中心Fat-Tree批量调度光包交换新架构

国家自然科学基金

2+阅读 · 2012年12月31日

ReCOB型压电晶体的力频效应及其苛刻环境下性能稳定性研究

国家自然科学基金

0+阅读 · 2012年12月31日

晶圆制造Interbay物料运输系统的动态调度研究

国家自然科学基金

0+阅读 · 2012年12月31日

高效中红外激光晶体Cr,Er,Re:YSGG（Re＝Eu3+, Tb3+）的生长及性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

肾康丸对糖尿病肾病大鼠miR-192介导通路的影响

国家自然科学基金

1+阅读 · 2009年12月31日

g2pW: A Conditional Weighted Softmax BERT for Polyphone Disambiguation in Mandarin

g2pW: A Conditional Weighted Softmax BERT for Polyphone Disambiguation in Mandarin

Arxiv

0+阅读 · 2022年8月25日

Fast Nearest Convolution for Real-Time Efficient Image Super-Resolution

Arxiv

0+阅读 · 2022年8月24日

Motion Robust High-Speed Light-weighted Object Detection with Event Camera

Arxiv

0+阅读 · 2022年8月24日

AlphaZero-Inspired Game Learning: Faster Training by Using MCTS Only at Test Time

Arxiv

0+阅读 · 2022年8月24日

SubFace: Learning with Softmax Approximation for Face Recognition

Arxiv

0+阅读 · 2022年8月24日

A Deep Learning Approach Using Masked Image Modeling for Reconstruction of Undersampled K-spaces

Arxiv

0+阅读 · 2022年8月24日

Combining Self-Training and Hybrid Architecture for Semi-supervised Abdominal Organ Segmentation

Arxiv

0+阅读 · 2022年8月24日

LightNER: A Lightweight Tuning Paradigm for Low-resource NER via Pluggable Prompting

Arxiv

0+阅读 · 2022年8月23日

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition

Arxiv

12+阅读 · 2021年5月30日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

VIP会员

文章信息

相关主题

Boosting（一种模型训练加速方式）

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

专知会员服务

136+阅读 · 2020年3月8日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

g2pW: A Conditional Weighted Softmax BERT for Polyphone Disambiguation in Mandarin

g2pW: A Conditional Weighted Softmax BERT for Polyphone Disambiguation in Mandarin

Arxiv

0+阅读 · 2022年8月25日

Fast Nearest Convolution for Real-Time Efficient Image Super-Resolution

Arxiv

0+阅读 · 2022年8月24日

Motion Robust High-Speed Light-weighted Object Detection with Event Camera

Arxiv

0+阅读 · 2022年8月24日

AlphaZero-Inspired Game Learning: Faster Training by Using MCTS Only at Test Time

Arxiv

0+阅读 · 2022年8月24日

SubFace: Learning with Softmax Approximation for Face Recognition

Arxiv

0+阅读 · 2022年8月24日

A Deep Learning Approach Using Masked Image Modeling for Reconstruction of Undersampled K-spaces

Arxiv

0+阅读 · 2022年8月24日

Combining Self-Training and Hybrid Architecture for Semi-supervised Abdominal Organ Segmentation

Arxiv

0+阅读 · 2022年8月24日

LightNER: A Lightweight Tuning Paradigm for Low-resource NER via Pluggable Prompting

Arxiv

0+阅读 · 2022年8月23日

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition

Arxiv

12+阅读 · 2021年5月30日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

相关基金

活性气体分子硫化氢对RNASET2介导的黑素细胞损伤保护作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Cav1.2钙通道的镁离子依赖性易化与失活作用的新机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

VP22-Hoxa3修饰糖尿病源性Gr-1+CD11b+髓样细胞治疗缺血性心脏病的实验研究

国家自然科学基金

0+阅读 · 2014年12月31日

MRKβ与p-MSK1介导NFκB信号通路在神经炎症引起的神经元损伤中的作用及其机制

国家自然科学基金

0+阅读 · 2013年12月31日

MoS2纳米结构析氢活性的机理研究及优化

国家自然科学基金

0+阅读 · 2013年12月31日

数据中心Fat-Tree批量调度光包交换新架构

国家自然科学基金

2+阅读 · 2012年12月31日

ReCOB型压电晶体的力频效应及其苛刻环境下性能稳定性研究

国家自然科学基金

0+阅读 · 2012年12月31日

晶圆制造Interbay物料运输系统的动态调度研究

国家自然科学基金

0+阅读 · 2012年12月31日

高效中红外激光晶体Cr,Er,Re:YSGG（Re＝Eu3+, Tb3+）的生长及性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

肾康丸对糖尿病肾病大鼠miR-192介导通路的影响

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员