自动承认长式语音语音自动语音识别 (Directed Speech Separation for Automatic Speech Recognition of Long Form Conversational Speech) - 专知论文

会员服务 ·

0

分离的 · 自动语音识别 · 语音识别 · 有向 · MoDELS ·

2021 年 12 月 10 日

Directed Speech Separation for Automatic Speech Recognition of Long Form Conversational Speech

翻译：自动承认长式语音语音自动语音识别

Rohit Paturi,Sundararajan Srinivasan,Katrin Kirchhoff

Many of the recent advances in speech separation are primarily aimed at synthetic mixtures of short audio utterances with high degrees of overlap. These datasets significantly differ from the real conversational data and hence, the models trained and evaluated on these datasets do not generalize to real conversational scenarios. Another issue with using most of these models for long form speech is the nondeterministic ordering of separated speech segments due to either unsupervised clustering for time-frequency masks or Permutation Invariant training (PIT) loss. This leads to difficulty in accurately stitching homogenous speaker segments for downstream tasks like Automatic Speech Recognition (ASR). In this paper, we propose a speaker conditioned separator trained on speaker embeddings extracted directly from the mixed signal. We train this model using a directed loss which regulates the order of the separated segments. With this model, we achieve significant improvements on Word error rate (WER) for real conversational data without the need for an additional re-stitching step.

翻译：在语音分隔方面最近取得的许多进展主要针对合成的短音音量组合,其高度重叠。这些数据集与真实的谈话数据大不相同,因此,在这些数据集上经过训练和评价的模型并不概括于真实的谈话情景。使用大多数这些模型进行长式语音表达的另一个问题是,由于对时间-频面罩或变异性培训的损失没有监督的组合,分离的语音部分没有确定顺序。这导致难以精确地为下游任务,如自动语音识别(ASR),对同质语音部分进行缝合。在本文中,我们建议用一个通过隔开的信号直接嵌入语音信号而培训的有主机的分隔器。我们用一种直接损失来训练这一模型,以调节分离部分的顺序。使用这一模型,我们大大改进了对真实对话数据的WER错误率,而无需采取额外的重新细化步骤。

0

相关内容

分离的

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

【EMNLP 2019 最佳论文】信息瓶颈专门化单词嵌入（用于解析）（Specializing Word Embeddings（for Parsing）by Information Bottleneck）

【EMNLP 2019 最佳论文】信息瓶颈专门化单词嵌入（用于解析）（Specializing Word Embeddings（for Parsing）by Information Bottleneck）

专知会员服务

24+阅读 · 2019年11月20日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

最新NLP论文阅读列表，包括对话、问答、摘要、翻译、看图说话等

最新NLP论文阅读列表，包括对话、问答、摘要、翻译、看图说话等

专知

9+阅读 · 2019年3月22日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

自然语言处理顶会EMNLP2018接受论文列表！

自然语言处理顶会EMNLP2018接受论文列表！

专知

87+阅读 · 2018年8月26日

已删除

清华大学研究生教育

3+阅读 · 2018年6月30日

开源自动语音识别系统wav2letter (附实现教程)

开源自动语音识别系统wav2letter (附实现教程)

七月在线实验室

10+阅读 · 2018年1月8日

【推荐】RNN无损压缩方法DeepZip（附代码）

【推荐】RNN无损压缩方法DeepZip（附代码）

机器学习研究会

5+阅读 · 2018年1月1日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

Unsupervised word-level prosody tagging for controllable speech synthesis

Unsupervised word-level prosody tagging for controllable speech synthesis

Arxiv

0+阅读 · 2022年2月16日

Textless Speech Emotion Conversion using Discrete and Decomposed Representations

Arxiv

0+阅读 · 2022年2月15日

USTED: Improving ASR with a Unified Speech and Text Encoder-Decoder

Arxiv

0+阅读 · 2022年2月12日

Training Robust Zero-Shot Voice Conversion Models with Self-supervised Features

Arxiv

0+阅读 · 2022年2月10日

Enhancing ASR for Stuttered Speech with Limited Data Using Detect and Pass

Arxiv

0+阅读 · 2022年2月8日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Exploring RNN-Transducer for Chinese Speech Recognition

Arxiv

4+阅读 · 2019年4月23日

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Arxiv

7+阅读 · 2019年4月18日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

End-to-End Speech Recognition From the Raw Waveform

Arxiv

3+阅读 · 2018年6月19日

VIP会员

文章信息

相关主题

自动语音识别

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

【EMNLP 2019 最佳论文】信息瓶颈专门化单词嵌入（用于解析）（Specializing Word Embeddings（for Parsing）by Information Bottleneck）

【EMNLP 2019 最佳论文】信息瓶颈专门化单词嵌入（用于解析）（Specializing Word Embeddings（for Parsing）by Information Bottleneck）

专知会员服务

24+阅读 · 2019年11月20日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

最新NLP论文阅读列表，包括对话、问答、摘要、翻译、看图说话等

最新NLP论文阅读列表，包括对话、问答、摘要、翻译、看图说话等

专知

9+阅读 · 2019年3月22日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

自然语言处理顶会EMNLP2018接受论文列表！

自然语言处理顶会EMNLP2018接受论文列表！

专知

87+阅读 · 2018年8月26日

已删除

清华大学研究生教育

3+阅读 · 2018年6月30日

开源自动语音识别系统wav2letter (附实现教程)

开源自动语音识别系统wav2letter (附实现教程)

七月在线实验室

10+阅读 · 2018年1月8日

【推荐】RNN无损压缩方法DeepZip（附代码）

【推荐】RNN无损压缩方法DeepZip（附代码）

机器学习研究会

5+阅读 · 2018年1月1日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

相关论文

Unsupervised word-level prosody tagging for controllable speech synthesis

Unsupervised word-level prosody tagging for controllable speech synthesis

Arxiv

0+阅读 · 2022年2月16日

Textless Speech Emotion Conversion using Discrete and Decomposed Representations

Arxiv

0+阅读 · 2022年2月15日

USTED: Improving ASR with a Unified Speech and Text Encoder-Decoder

Arxiv

0+阅读 · 2022年2月12日

Training Robust Zero-Shot Voice Conversion Models with Self-supervised Features

Arxiv

0+阅读 · 2022年2月10日

Enhancing ASR for Stuttered Speech with Limited Data Using Detect and Pass

Arxiv

0+阅读 · 2022年2月8日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Exploring RNN-Transducer for Chinese Speech Recognition

Arxiv

4+阅读 · 2019年4月23日

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Arxiv

7+阅读 · 2019年4月18日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

End-to-End Speech Recognition From the Raw Waveform

Arxiv

3+阅读 · 2018年6月19日

微信扫码咨询专知VIP会员