以半支持学习确认端端至端端端有条纹式自动语音识别 (End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning) - 专知论文

会员服务 ·

0

自动语音识别 · 词元分析器 · 语音识别 · 端到端 · 数据集 ·

2021 年 7 月 7 日

End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning

翻译：以半支持学习确认端端至端端端有条纹式自动语音识别

Tomohiro Tanaka,Ryo Masumura,Mana Ihori,Akihiko Takashima,Shota Orihashi,Naoki Makishima

from arxiv, Accepted at Interspeech 2021

We propose a semi-supervised learning method for building end-to-end rich transcription-style automatic speech recognition (RT-ASR) systems from small-scale rich transcription-style and large-scale common transcription-style datasets. In spontaneous speech tasks, various speech phenomena such as fillers, word fragments, laughter and coughs, etc. are often included. While common transcriptions do not give special awareness to these phenomena, rich transcriptions explicitly convert them into special phenomenon tokens as well as textual tokens. In previous studies, the textual and phenomenon tokens were simultaneously estimated in an end-to-end manner. However, it is difficult to build accurate RT-ASR systems because large-scale rich transcription-style datasets are often unavailable. To solve this problem, our training method uses a limited rich transcription-style dataset and common transcription-style dataset simultaneously. The Key process in our semi-supervised learning is to convert the common transcription-style dataset into a pseudo-rich transcription-style dataset. To this end, we introduce style tokens which control phenomenon tokens are generated or not into transformer-based autoregressive modeling. We use this modeling for generating the pseudo-rich transcription-style datasets and for building RT-ASR system from the pseudo and original datasets. Our experiments on spontaneous ASR tasks showed the effectiveness of the proposed method.

翻译：我们建议一种半监督的学习方法,从小规模的丰富转录式和大规模通用转录式数据集中建立端到端的丰富转录式自动语音识别(RT-ASR)系统。在自发的演讲任务中,常常包括各种演讲现象,如填充器、字片、笑声和咳嗽等。虽然普通的抄录并不特别了解这些现象,但丰富的抄录将它们明确转换成特殊现象符号和文本符号。在以往的研究中,文本和现象符号是以端到端的方式同时估算的。然而,由于大规模丰富的转录式数据集往往无法使用,因此很难建立准确的RT-ASR系统。为了解决这个问题,我们的培训方法使用有限的丰富转录式数据集和普通转录式数据集。我们半监督学习的关键进程是将普通的正本转录录式数据集转换成一个伪富正本的转录制式数据集。为此,我们引入了风格的代谢式代写式代录式代谢系统,用于控制我们变现或变现的缩制模型。

0

相关内容

自动语音识别

自动语音识别

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

专知会员服务

84+阅读 · 2020年11月25日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

深度学习搜索，Exploring Deep Learning for Search

深度学习搜索，Exploring Deep Learning for Search

专知会员服务

61+阅读 · 2020年5月9日

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

专知会员服务

147+阅读 · 2020年4月11日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【布朗大学】从像素到建筑物:用于大规模语义映射的端到端的概率深度网络（From Pixels to Buildings: End-to-end Probabilistic Deep Networks for Large-scale Semantic Mapping）

【布朗大学】从像素到建筑物:用于大规模语义映射的端到端的概率深度网络（From Pixels to Buildings: End-to-end Probabilistic Deep Networks for Large-scale Semantic Mapping）

专知会员服务

7+阅读 · 2019年12月22日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【机器学习】无处不在的机器学习

【机器学习】无处不在的机器学习

产业智能官

4+阅读 · 2019年8月18日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

集成学习入门

集成学习入门

论智

8+阅读 · 2018年3月5日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

Scaling End-to-End Models for Large-Scale Multilingual ASR

Arxiv

0+阅读 · 2021年9月11日

Multi-task Language Modeling for Improving Speech Recognition of Rare Words

Arxiv

0+阅读 · 2021年9月11日

Teacher-Student MixIT for Unsupervised and Semi-supervised Speech Separation

Teacher-Student MixIT for Unsupervised and Semi-supervised Speech Separation

Arxiv

0+阅读 · 2021年9月9日

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

Arxiv

8+阅读 · 2021年6月10日

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

Arxiv

6+阅读 · 2020年10月26日

Semi-Supervised Contrastive Learning with Generalized Contrastive Loss and Its Application to Speaker Recognition

Semi-Supervised Contrastive Learning with Generalized Contrastive Loss and Its Application to Speaker Recognition

Arxiv

7+阅读 · 2020年6月8日

Curriculum Pre-training for End-to-End Speech Translation

Arxiv

4+阅读 · 2020年4月21日

Learning Conceptual-Contexual Embeddings for Medical Text

Arxiv

27+阅读 · 2019年8月16日

An end-to-end Neural Network Framework for Text Clustering

An end-to-end Neural Network Framework for Text Clustering

Arxiv

6+阅读 · 2019年3月22日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

VIP会员

文章信息

相关主题

自动语音识别

词元分析器

相关VIP内容

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

专知会员服务

84+阅读 · 2020年11月25日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

深度学习搜索，Exploring Deep Learning for Search

深度学习搜索，Exploring Deep Learning for Search

专知会员服务

61+阅读 · 2020年5月9日

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

专知会员服务

147+阅读 · 2020年4月11日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【布朗大学】从像素到建筑物:用于大规模语义映射的端到端的概率深度网络（From Pixels to Buildings: End-to-end Probabilistic Deep Networks for Large-scale Semantic Mapping）

【布朗大学】从像素到建筑物:用于大规模语义映射的端到端的概率深度网络（From Pixels to Buildings: End-to-end Probabilistic Deep Networks for Large-scale Semantic Mapping）

专知会员服务

7+阅读 · 2019年12月22日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《毁灭算法：解析以色列在加沙的AI军事行动》

【COLT 2025最新教程】语言生成

以机器速度锁定目标：人工智能的能力与局限

【ICML2025】通过在线世界模型规划的持续强化学习

相关资讯

【机器学习】无处不在的机器学习

【机器学习】无处不在的机器学习

产业智能官

4+阅读 · 2019年8月18日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

集成学习入门

集成学习入门

论智

8+阅读 · 2018年3月5日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

相关论文

Scaling End-to-End Models for Large-Scale Multilingual ASR

Arxiv

0+阅读 · 2021年9月11日

Multi-task Language Modeling for Improving Speech Recognition of Rare Words

Arxiv

0+阅读 · 2021年9月11日

Teacher-Student MixIT for Unsupervised and Semi-supervised Speech Separation

Teacher-Student MixIT for Unsupervised and Semi-supervised Speech Separation

Arxiv

0+阅读 · 2021年9月9日

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

Arxiv

8+阅读 · 2021年6月10日

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

Arxiv

6+阅读 · 2020年10月26日

Semi-Supervised Contrastive Learning with Generalized Contrastive Loss and Its Application to Speaker Recognition

Semi-Supervised Contrastive Learning with Generalized Contrastive Loss and Its Application to Speaker Recognition

Arxiv

7+阅读 · 2020年6月8日

Curriculum Pre-training for End-to-End Speech Translation

Arxiv

4+阅读 · 2020年4月21日

Learning Conceptual-Contexual Embeddings for Medical Text

Arxiv

27+阅读 · 2019年8月16日

An end-to-end Neural Network Framework for Text Clustering

An end-to-end Neural Network Framework for Text Clustering

Arxiv

6+阅读 · 2019年3月22日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

微信扫码咨询专知VIP会员