与 DNN 音响模型重新审议基于多对话器语音识别的基于联合解码的多语层语音识别 (Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model) - 专知论文

会员服务 ·

0

解码 · MoDELS · 语音识别 · DNN · contrastive ·

2021 年 10 月 31 日

Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model

翻译：与 DNN 音响模型重新审议基于多对话器语音识别的基于联合解码的多语层语音识别

Martin Kocour,Kateřina Žmolíková,Lucas Ondel,Ján Švec,Marc Delcroix,Tsubasa Ochiai,Lukáš Burget,Jan Černocký

from arxiv, submitted to ICASSP 2022

In typical multi-talker speech recognition systems, a neural network-based acoustic model predicts senone state posteriors for each speaker. These are later used by a single-talker decoder which is applied on each speaker-specific output stream separately. In this work, we argue that such a scheme is sub-optimal and propose a principled solution that decodes all speakers jointly. We modify the acoustic model to predict joint state posteriors for all speakers, enabling the network to express uncertainty about the attribution of parts of the speech signal to the speakers. We employ a joint decoder that can make use of this uncertainty together with higher-level language information. For this, we revisit decoding algorithms used in factorial generative models in early multi-talker speech recognition systems. In contrast with these early works, we replace the GMM acoustic model with DNN, which provides greater modeling power and simplifies part of the inference. We demonstrate the advantage of joint decoding in proof of concept experiments on a mixed-TIDIGITS dataset.

翻译：在典型的多讲者语音识别系统中,一个基于神经网络的声学模型预测每个发言者都使用神经网络状态后台,这些后台后来被单独适用于每个发言者特定输出流的单一讲台解码器使用。在这项工作中,我们争辩说,这种计划是次优化的,并提出了一个原则性解决办法,将所有发言者都解码起来。我们修改了声学模型,以预测所有发言者的联合国家后台,使网络能够表达对部分语音信号归属给发言者的不确定性。我们使用一个联合解码器,可以使用这一不确定性与更高层次的语言信息一起使用。为此,我们重新审视在早期多讲台语音识别系统中用于保分质基因模型的解码算法。与这些早期工程相比,我们用DNN(GM)取代了GM声学模型,该模型提供了更大的建模能力,并简化了部分推理。我们展示了在混合的TIGITS数据集上进行概念实验的优势。

0

相关内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

计算机 | ISMAR 2019等国际会议信息8条

计算机 | ISMAR 2019等国际会议信息8条

Call4Papers

3+阅读 · 2019年3月5日

VALSE Webinar 19-05期自动机器学习 AutoML

VALSE Webinar 19-05期自动机器学习 AutoML

VALSE

8+阅读 · 2019年2月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【泡泡机器人公开课】第八十六课多相机视觉SLAM介绍 - 杨绍武

【泡泡机器人公开课】第八十六课多相机视觉SLAM介绍 - 杨绍武

机器学习研究会

4+阅读 · 2017年9月20日

Diformer: Directional Transformer for Neural Machine Translation

Arxiv

0+阅读 · 2021年12月31日

The Classical Multidimensional Scaling Revisited

Arxiv

0+阅读 · 2021年12月29日

Multi-Dialect Arabic Speech Recognition

Arxiv

0+阅读 · 2021年12月25日

Efficient Data-specific Model Search for Collaborative Filtering

Efficient Data-specific Model Search for Collaborative Filtering

Arxiv

4+阅读 · 2021年6月14日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Insertion-based Decoding with automatically Inferred Generation Order

Arxiv

5+阅读 · 2019年2月28日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Cross-type Biomedical Named Entity Recognition with Deep Multi-Task Learning

Arxiv

10+阅读 · 2018年1月30日

Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model

Arxiv

7+阅读 · 2018年1月23日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能绝不能完全自主》

《人工智能的法律与伦理：军事自主机器独特挑战的深度剖析》316页

从数据到主导：AI与兵棋推演构筑决策优势

《特洛伊木马货柜：武器化集装箱的战略威胁》最新报告

相关资讯

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

计算机 | ISMAR 2019等国际会议信息8条

计算机 | ISMAR 2019等国际会议信息8条

Call4Papers

3+阅读 · 2019年3月5日

VALSE Webinar 19-05期自动机器学习 AutoML

VALSE Webinar 19-05期自动机器学习 AutoML

VALSE

8+阅读 · 2019年2月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【泡泡机器人公开课】第八十六课多相机视觉SLAM介绍 - 杨绍武

【泡泡机器人公开课】第八十六课多相机视觉SLAM介绍 - 杨绍武

机器学习研究会

4+阅读 · 2017年9月20日

相关论文

Diformer: Directional Transformer for Neural Machine Translation

Arxiv

0+阅读 · 2021年12月31日

The Classical Multidimensional Scaling Revisited

Arxiv

0+阅读 · 2021年12月29日

Multi-Dialect Arabic Speech Recognition

Arxiv

0+阅读 · 2021年12月25日

Efficient Data-specific Model Search for Collaborative Filtering

Efficient Data-specific Model Search for Collaborative Filtering

Arxiv

4+阅读 · 2021年6月14日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Insertion-based Decoding with automatically Inferred Generation Order

Arxiv

5+阅读 · 2019年2月28日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Cross-type Biomedical Named Entity Recognition with Deep Multi-Task Learning

Arxiv

10+阅读 · 2018年1月30日

Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model

Arxiv

7+阅读 · 2018年1月23日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

微信扫码咨询专知VIP会员