CODERT: 采用共同学习方式,为基于传输者的语音识别而将编码器代表法与共同学习 (CoDERT: Distilling Encoder Representations with Co-learning for Transducer-based Speech Recognition) - 专知论文

会员服务 ·

0

蒸馏 · INFORMS · 语音识别 · 对数几率 · SimPLe ·

2021 年 6 月 14 日

CoDERT: Distilling Encoder Representations with Co-learning for Transducer-based Speech Recognition

翻译：CODERT: 采用共同学习方式,为基于传输者的语音识别而将编码器代表法与共同学习

Rupak Vignesh Swaminathan,Brian King,Grant P. Strimel,Jasha Droppo,Athanasios Mouchtaris

from arxiv, Accepted at InterSpeech 2021

We propose a simple yet effective method to compress an RNN-Transducer (RNN-T) through the well-known knowledge distillation paradigm. We show that the transducer's encoder outputs naturally have a high entropy and contain rich information about acoustically similar word-piece confusions. This rich information is suppressed when combined with the lower entropy decoder outputs to produce the joint network logits. Consequently, we introduce an auxiliary loss to distill the encoder logits from a teacher transducer's encoder, and explore training strategies where this encoder distillation works effectively. We find that tandem training of teacher and student encoders with an inplace encoder distillation outperforms the use of a pre-trained and static teacher transducer. We also report an interesting phenomenon we refer to as implicit distillation, that occurs when the teacher and student encoders share the same decoder. Our experiments show 5.37-8.4% relative word error rate reductions (WERR) on in-house test sets, and 5.05-6.18% relative WERRs on LibriSpeech test sets.

翻译：我们提出一个简单而有效的方法,通过众所周知的知识蒸馏模式压缩RNN- Transporter(RNN-T) 。我们表明, 转换器的编码器输出自然具有很高的灵敏度, 并含有声学上相似的字形混乱的丰富信息。这种丰富的信息在与低温的解码器输出结合以生成联合网络日志时被抑制。因此, 我们引入了一种辅助性损失, 从教师的转换器的编码器编码器中蒸馏编码器记录, 并探索该编码器蒸馏工作有效的培训策略。我们发现, 对教师和学生的编码器进行同步培训, 与一个内置的编码器蒸馏器相比, 使用一个受过预先训练的静态教师传感器。我们还报告了一种有趣的现象, 我们称之为隐含的蒸馏器, 当教师和学生的编码器共享相同的解译器时, 就会发生这种现象。我们的实验显示, 在室内测试器中, 5.37-8.4% 相对字差率降幅(WERRRR) 和5.05-6.

0

相关内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

专知会员服务

56+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

专知会员服务

52+阅读 · 2019年12月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Github项目推荐 | awesome-bert：BERT相关资源大列表

Github项目推荐 | awesome-bert：BERT相关资源大列表

AI研习社

27+阅读 · 2019年2月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Jointly Improving Summarization and Sentiment Classification

Jointly Improving Summarization and Sentiment Classification

黑龙江大学自然语言处理实验室

3+阅读 · 2018年6月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Python机器学习教程资料/代码

Python机器学习教程资料/代码

机器学习研究会

8+阅读 · 2018年2月22日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

已删除

将门创投

7+阅读 · 2017年7月11日

Learning a Neural Diff for Speech Models

Arxiv

0+阅读 · 2021年8月17日

Fractional Transfer Learning for Deep Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2021年8月14日

W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training

Arxiv

0+阅读 · 2021年8月7日

Representation Learning with Ordered Relation Paths for Knowledge Graph Completion

Representation Learning with Ordered Relation Paths for Knowledge Graph Completion

Arxiv

12+阅读 · 2019年9月26日

ERNIE: Enhanced Language Representation with Informative Entities

Arxiv

5+阅读 · 2019年5月17日

Exploring RNN-Transducer for Chinese Speech Recognition

Arxiv

4+阅读 · 2019年4月23日

Angular-Based Word Meta-Embedding Learning

Angular-Based Word Meta-Embedding Learning

Arxiv

3+阅读 · 2018年8月13日

Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder

Arxiv

4+阅读 · 2018年5月24日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

Improving Visually Grounded Sentence Representations with Self-Attention

Arxiv

8+阅读 · 2017年12月2日

VIP会员

文章信息

相关主题

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

专知会员服务

56+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

专知会员服务

52+阅读 · 2019年12月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Github项目推荐 | awesome-bert：BERT相关资源大列表

Github项目推荐 | awesome-bert：BERT相关资源大列表

AI研习社

27+阅读 · 2019年2月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Jointly Improving Summarization and Sentiment Classification

Jointly Improving Summarization and Sentiment Classification

黑龙江大学自然语言处理实验室

3+阅读 · 2018年6月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Python机器学习教程资料/代码

Python机器学习教程资料/代码

机器学习研究会

8+阅读 · 2018年2月22日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

已删除

将门创投

7+阅读 · 2017年7月11日

相关论文

Learning a Neural Diff for Speech Models

Arxiv

0+阅读 · 2021年8月17日

Fractional Transfer Learning for Deep Model-Based Reinforcement Learning

Arxiv

0+阅读 · 2021年8月14日

W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training

Arxiv

0+阅读 · 2021年8月7日

Representation Learning with Ordered Relation Paths for Knowledge Graph Completion

Representation Learning with Ordered Relation Paths for Knowledge Graph Completion

Arxiv

12+阅读 · 2019年9月26日

ERNIE: Enhanced Language Representation with Informative Entities

Arxiv

5+阅读 · 2019年5月17日

Exploring RNN-Transducer for Chinese Speech Recognition

Arxiv

4+阅读 · 2019年4月23日

Angular-Based Word Meta-Embedding Learning

Angular-Based Word Meta-Embedding Learning

Arxiv

3+阅读 · 2018年8月13日

Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder

Arxiv

4+阅读 · 2018年5月24日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

Improving Visually Grounded Sentence Representations with Self-Attention

Arxiv

8+阅读 · 2017年12月2日

微信扫码咨询专知VIP会员