机器翻译会影响质量，以下翻译经过AI校对，如有不准确之处还请见谅：中英语言交替ASR中的代码切换文本生成和注入 (Code-Switching Text Generation and Injection in Mandarin-English ASR) - 专知论文

会员服务 ·

0

语音识别 · MoDELS · Performer · 语音合成 · 情景 ·

2023 年 3 月 20 日

Code-Switching Text Generation and Injection in Mandarin-English ASR

翻译：机器翻译会影响质量，以下翻译经过AI校对，如有不准确之处还请见谅：中英语言交替ASR中的代码切换文本生成和注入

Haibin Yu,Yuxuan Hu,Yao Qian,Ma Jin,Linquan Liu,Shujie Liu,Yu Shi,Yanmin Qian,Edward Lin,Michael Zeng

from arxiv, Accepted by ICASSP 2023

Code-switching speech refers to a means of expression by mixing two or more languages within a single utterance. Automatic Speech Recognition (ASR) with End-to-End (E2E) modeling for such speech can be a challenging task due to the lack of data. In this study, we investigate text generation and injection for improving the performance of an industry commonly-used streaming model, Transformer-Transducer (T-T), in Mandarin-English code-switching speech recognition. We first propose a strategy to generate code-switching text data and then investigate injecting generated text into T-T model explicitly by Text-To-Speech (TTS) conversion or implicitly by tying speech and text latent spaces. Experimental results on the T-T model trained with a dataset containing 1,800 hours of real Mandarin-English code-switched speech show that our approaches to inject generated code-switching text significantly boost the performance of T-T models, i.e., 16% relative Token-based Error Rate (TER) reduction averaged on three evaluation sets, and the approach of tying speech and text latent spaces is superior to that of TTS conversion on the evaluation set which contains more homogeneous data with the training set.

翻译：代码切换语音是指在单个话语中混合两种或更多的语言来表达。由于缺乏数据，对于这种语音的端到端（E2E）建模的自动语音识别（ASR）可能是一个具有挑战性的任务。在本研究中，我们探讨了文本生成和注入的方法，以提高在汉英代码交替语音识别中广泛使用的流模型——Transformer-Transducer（T-T）的性能。我们首先提出一种策略来生成代码切换文本数据，然后通过文本转语音（TTS）转换明确地将生成的文本注入T-T模型中，或者通过绑定语音和文本潜在空间隐含地注入。在包含1800小时真实汉英语言代码交替语音的数据集上，T-T模型的实验结果表明，我们的代码切换文本注入方法显著提高了T-T模型的性能，即在三个评估集上平均相对Token-based Error Rate（TER）降低16％，而将语音和文本潜在空间绑定的方法在包含更均质数据的评估集上表现优于TTS转换的方法。

0

相关内容

语音识别

语音识别是计算机科学和计算语言学的一个跨学科子领域，它发展了一些方法和技术，使计算机可以将口语识别和翻译成文本。它也被称为自动语音识别（ASR），计算机语音识别或语音转文本（STT）。它整合了计算机科学，语言学和计算机工程领域的知识和研究。

ChatGPT大模型全栈技术讲解！霍普金斯最新《NLP：自监督模型》2023课程全面讲解预训练指令学习和RLHF等技术，附讲义

ChatGPT大模型全栈技术讲解！霍普金斯最新《NLP：自监督模型》2023课程全面讲解预训练指令学习和RLHF等技术，附讲义

专知会员服务

108+阅读 · 2023年4月8日

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【EMNLP2020】自然语言生成，Neural Language Generation

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

专知会员服务

33+阅读 · 2020年4月24日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

专知会员服务

25+阅读 · 2019年11月15日

【AAAI2020接受论文】利用图卷积网络将知识注入文本任务，Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks

【AAAI2020接受论文】利用图卷积网络将知识注入文本任务，Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks

专知会员服务

45+阅读 · 2019年11月11日

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知

4+阅读 · 2022年10月2日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

专知

52+阅读 · 2018年6月28日

在Python中使用SpaCy进行文本分类

在Python中使用SpaCy进行文本分类

专知

24+阅读 · 2018年5月8日

word2vec中文语料训练

word2vec中文语料训练

全球人工智能

12+阅读 · 2018年4月23日

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

专知

13+阅读 · 2018年4月4日

ERK3介导TNF-α调控头颈鳞癌淋巴管生成的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

基于不确定边界建模的南极海冰冰缘动态变化研究

国家自然科学基金

0+阅读 · 2013年12月31日

句子语境下的语音加工的神经机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

miR-335/TGF-β1/Smad通路调控EMT影响胃癌腹膜转移的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于复杂网络的中文文本语义相似度研究

国家自然科学基金

3+阅读 · 2012年12月31日

双语者句子理解过程中句法加工的认知/神经时间动态性

国家自然科学基金

0+阅读 · 2012年12月31日

mTOR信号通路在痛相关海马突触可塑性中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

跨语言信息检索中的机器翻译研究

国家自然科学基金

2+阅读 · 2011年12月31日

基于Web及知识获取的无指导汉语词义消歧技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

不同语言背景下蒙汉双语者语义表征特点及代码切换机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

Consistent Text Categorization using Data Augmentation in e-Commerce

Arxiv

0+阅读 · 2023年5月9日

CaseEncoder: A Knowledge-enhanced Pre-trained Model for Legal Case Encoding

Arxiv

0+阅读 · 2023年5月9日

SeqDiffuSeq: Text Diffusion with Encoder-Decoder Transformers

Arxiv

0+阅读 · 2023年5月9日

GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator

Arxiv

0+阅读 · 2023年5月9日

SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models

Arxiv

0+阅读 · 2023年5月9日

AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis

Arxiv

0+阅读 · 2023年5月8日

FashionTex: Controllable Virtual Try-on with Text and Texture

Arxiv

0+阅读 · 2023年5月8日

Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning

Arxiv

0+阅读 · 2023年5月7日

ToolCoder: Teach Code Generation Models to use APIs with search tools

Arxiv

0+阅读 · 2023年5月6日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

VIP会员

文章信息

相关主题

相关VIP内容

ChatGPT大模型全栈技术讲解！霍普金斯最新《NLP：自监督模型》2023课程全面讲解预训练指令学习和RLHF等技术，附讲义

ChatGPT大模型全栈技术讲解！霍普金斯最新《NLP：自监督模型》2023课程全面讲解预训练指令学习和RLHF等技术，附讲义

专知会员服务

108+阅读 · 2023年4月8日

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【EMNLP2020】自然语言生成，Neural Language Generation

【EMNLP2020】自然语言生成，Neural Language Generation

专知会员服务

39+阅读 · 2020年11月20日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

【IJCAI2020】神经摘要结构性注意力，Neural Abstractive Summarization with Structural Attention

专知会员服务

33+阅读 · 2020年4月24日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

专知会员服务

25+阅读 · 2019年11月15日

【AAAI2020接受论文】利用图卷积网络将知识注入文本任务，Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks

【AAAI2020接受论文】利用图卷积网络将知识注入文本任务，Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks

专知会员服务

45+阅读 · 2019年11月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国海军陆战队软件定义网络应用案例：分布式防火墙自动化系统》148页

《多体环境下定位导航授时（PNT）系统研究》228页

软件定义无线电（SDR）：商业与军事领域的技术、应用及未来趋势

《攻势防空作战中无人追击者/规避者最优轨迹研究（含动态交战区建模）》95页

相关资讯

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知

4+阅读 · 2022年10月2日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

计算机 | EMNLP 2019等国际会议信息6条

计算机 | EMNLP 2019等国际会议信息6条

Call4Papers

18+阅读 · 2019年4月26日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

专知

52+阅读 · 2018年6月28日

在Python中使用SpaCy进行文本分类

在Python中使用SpaCy进行文本分类

专知

24+阅读 · 2018年5月8日

word2vec中文语料训练

word2vec中文语料训练

全球人工智能

12+阅读 · 2018年4月23日

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

【论文推荐】最新五篇信息抽取相关论文—端到端深度模型、调研、聊天机器人、自注意力、科学文本

专知

13+阅读 · 2018年4月4日

相关论文

Consistent Text Categorization using Data Augmentation in e-Commerce

Arxiv

0+阅读 · 2023年5月9日

CaseEncoder: A Knowledge-enhanced Pre-trained Model for Legal Case Encoding

Arxiv

0+阅读 · 2023年5月9日

SeqDiffuSeq: Text Diffusion with Encoder-Decoder Transformers

Arxiv

0+阅读 · 2023年5月9日

GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator

Arxiv

0+阅读 · 2023年5月9日

SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models

Arxiv

0+阅读 · 2023年5月9日

AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis

Arxiv

0+阅读 · 2023年5月8日

FashionTex: Controllable Virtual Try-on with Text and Texture

Arxiv

0+阅读 · 2023年5月8日

Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning

Arxiv

0+阅读 · 2023年5月7日

ToolCoder: Teach Code Generation Models to use APIs with search tools

Arxiv

0+阅读 · 2023年5月6日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日

相关基金

ERK3介导TNF-α调控头颈鳞癌淋巴管生成的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

基于不确定边界建模的南极海冰冰缘动态变化研究

国家自然科学基金

0+阅读 · 2013年12月31日

句子语境下的语音加工的神经机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

miR-335/TGF-β1/Smad通路调控EMT影响胃癌腹膜转移的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于复杂网络的中文文本语义相似度研究

国家自然科学基金

3+阅读 · 2012年12月31日

双语者句子理解过程中句法加工的认知/神经时间动态性

国家自然科学基金

0+阅读 · 2012年12月31日

mTOR信号通路在痛相关海马突触可塑性中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

跨语言信息检索中的机器翻译研究

国家自然科学基金

2+阅读 · 2011年12月31日

基于Web及知识获取的无指导汉语词义消歧技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

不同语言背景下蒙汉双语者语义表征特点及代码切换机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员