关于ASR单一频道发言隔离的实际问题调查 (Investigation of Practical Aspects of Single Channel Speech Separation for ASR) - 专知论文

会员服务 ·

0

分离的 · 语音识别 · MoDELS · 转录系统 · 优化器 ·

2021 年 7 月 5 日

Investigation of Practical Aspects of Single Channel Speech Separation for ASR

翻译：关于ASR单一频道发言隔离的实际问题调查

Jian Wu,Zhuo Chen,Sanyuan Chen,Yu Wu,Takuya Yoshioka,Naoyuki Kanda,Shujie Liu,Jinyu Li

from arxiv, Accepted by Interspeech 2021

Speech separation has been successfully applied as a frontend processing module of conversation transcription systems thanks to its ability to handle overlapped speech and its flexibility to combine with downstream tasks such as automatic speech recognition (ASR). However, a speech separation model often introduces target speech distortion, resulting in a sub-optimum word error rate (WER). In this paper, we describe our efforts to improve the performance of a single channel speech separation system. Specifically, we investigate a two-stage training scheme that firstly applies a feature level optimization criterion for pretraining, followed by an ASR-oriented optimization criterion using an end-to-end (E2E) speech recognition model. Meanwhile, to keep the model light-weight, we introduce a modified teacher-student learning technique for model compression. By combining those approaches, we achieve a absolute average WER improvement of 2.70% and 0.77% using models with less than 10M parameters compared with the previous state-of-the-art results on the LibriCSS dataset for utterance-wise evaluation and continuous evaluation, respectively

翻译：由于能够处理重叠的演讲及其与自动语音识别(ASR)等下游任务相结合的灵活性,语音分离模式被成功地用作对话记录系统前端处理模块。然而,语音分离模式往往引入目标言语扭曲,导致次优字错误率。在本文中,我们描述了我们为改善单一频道语音分离系统绩效所作的努力。具体地说,我们调查了两阶段培训计划,首先对培训前培训适用地平级优化标准,然后采用端至端语音识别模式,面向ASR的优化标准。与此同时,为了保持示范轻度,我们采用了经修改的师生学习技术,用于模式压缩。我们将这些方法结合起来,我们实现了绝对平均WER改进2.70%和0.77%,使用低于10M参数的模型,而使用LibriCSS数据集的高级结果,分别用于语音评估和持续评估。

0

相关内容

分离的

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日

如何写一份有效的机器学习/自然语言处理论文摘要？ Elvis Saravia

如何写一份有效的机器学习/自然语言处理论文摘要？ Elvis Saravia

专知会员服务

38+阅读 · 2020年5月17日

【斯坦福】探究预训练语言模型中的可迁移性，Investigating Transferability in PLM

【斯坦福】探究预训练语言模型中的可迁移性，Investigating Transferability in PLM

专知会员服务

20+阅读 · 2020年5月3日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

专知会员服务

7+阅读 · 2019年12月19日

【ICCV 2019】基于元学习的自动化神经网络通道 MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning

【ICCV 2019】基于元学习的自动化神经网络通道 MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning

专知会员服务

17+阅读 · 2019年11月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

基于BERT的ASR纠错

基于BERT的ASR纠错

深度学习自然语言处理

8+阅读 · 2020年7月16日

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

已删除

将门创投

3+阅读 · 2018年8月21日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

专知

14+阅读 · 2018年2月4日

论文共读 | Attention is All You Need

论文共读 | Attention is All You Need

黑龙江大学自然语言处理实验室

14+阅读 · 2017年9月7日

Semantic Communications for Speech Recognition

Semantic Communications for Speech Recognition

Arxiv

0+阅读 · 2021年9月6日

XMUSPEECH System for VoxCeleb Speaker Recognition Challenge 2021

Arxiv

0+阅读 · 2021年9月6日

Learning to Perform Downlink Channel Estimation in Massive MIMO Systems

Arxiv

0+阅读 · 2021年9月6日

Neural HMMs are all you need (for high-quality attention-free TTS)

Arxiv

0+阅读 · 2021年9月3日

On the Achievable Sum-rate of the RIS-aided MIMO Broadcast Channel

Arxiv

0+阅读 · 2021年9月3日

MmWave MIMO Communication with Semi-Passive RIS: A Low-Complexity Channel Estimation Scheme

Arxiv

0+阅读 · 2021年9月2日

ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection

Arxiv

1+阅读 · 2021年9月1日

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Arxiv

16+阅读 · 2020年3月12日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

相关VIP内容

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日

如何写一份有效的机器学习/自然语言处理论文摘要？ Elvis Saravia

如何写一份有效的机器学习/自然语言处理论文摘要？ Elvis Saravia

专知会员服务

38+阅读 · 2020年5月17日

【斯坦福】探究预训练语言模型中的可迁移性，Investigating Transferability in PLM

【斯坦福】探究预训练语言模型中的可迁移性，Investigating Transferability in PLM

专知会员服务

20+阅读 · 2020年5月3日

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

【Google Research】Wavesplit:通过说话者聚类实现端到端的语音分离，Wavesplit: End-to-End Speech Separation by Speaker Clustering

专知会员服务

19+阅读 · 2020年2月26日

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

专知会员服务

7+阅读 · 2019年12月19日

【ICCV 2019】基于元学习的自动化神经网络通道 MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning

【ICCV 2019】基于元学习的自动化神经网络通道 MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning

专知会员服务

17+阅读 · 2019年11月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

基于BERT的ASR纠错

基于BERT的ASR纠错

深度学习自然语言处理

8+阅读 · 2020年7月16日

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

已删除

将门创投

3+阅读 · 2018年8月21日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

专知

14+阅读 · 2018年2月4日

论文共读 | Attention is All You Need

论文共读 | Attention is All You Need

黑龙江大学自然语言处理实验室

14+阅读 · 2017年9月7日

相关论文

Semantic Communications for Speech Recognition

Semantic Communications for Speech Recognition

Arxiv

0+阅读 · 2021年9月6日

XMUSPEECH System for VoxCeleb Speaker Recognition Challenge 2021

Arxiv

0+阅读 · 2021年9月6日

Learning to Perform Downlink Channel Estimation in Massive MIMO Systems

Arxiv

0+阅读 · 2021年9月6日

Neural HMMs are all you need (for high-quality attention-free TTS)

Arxiv

0+阅读 · 2021年9月3日

On the Achievable Sum-rate of the RIS-aided MIMO Broadcast Channel

Arxiv

0+阅读 · 2021年9月3日

MmWave MIMO Communication with Semi-Passive RIS: A Low-Complexity Channel Estimation Scheme

Arxiv

0+阅读 · 2021年9月2日

ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection

Arxiv

1+阅读 · 2021年9月1日

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Hyper-Parameter Optimization: A Review of Algorithms and Applications

Arxiv

16+阅读 · 2020年3月12日

Improved Speech Enhancement with the Wave-U-Net

Arxiv

8+阅读 · 2018年11月27日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

微信扫码咨询专知VIP会员