从RNN-T型模型中用全体损失的噪音培训标签进行强力知识蒸馏</s> (Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss) - 专知论文

会员服务 ·

0

蒸馏 · 知识 (knowledge) · MoDELS · 稳健性 · SOFT ·

2023 年 3 月 10 日

Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss

翻译：从RNN-T型模型中用全体损失的噪音培训标签进行强力知识蒸馏

Mohammad Zeineldeen,Kartik Audhkhasi,Murali Karthick Baskar,Bhuvana Ramabhadran

from arxiv, Accepted at ICASSP 2023

This work studies knowledge distillation (KD) and addresses its constraints for recurrent neural network transducer (RNN-T) models. In hard distillation, a teacher model transcribes large amounts of unlabelled speech to train a student model. Soft distillation is another popular KD method that distills the output logits of the teacher model. Due to the nature of RNN-T alignments, applying soft distillation between RNN-T architectures having different posterior distributions is challenging. In addition, bad teachers having high word-error-rate (WER) reduce the efficacy of KD. We investigate how to effectively distill knowledge from variable quality ASR teachers, which has not been studied before to the best of our knowledge. We show that a sequence-level KD, full-sum distillation, outperforms other distillation methods for RNN-T models, especially for bad teachers. We also propose a variant of full-sum distillation that distills the sequence discriminative knowledge of the teacher leading to further improvement in WER. We conduct experiments on public datasets namely SpeechStew and LibriSpeech, and on in-house production data.

翻译：这项工作研究蒸馏技术(KD), 并解决对经常神经网络转换器(RNNN-T)模型的制约。在硬蒸馏中, 教师模型将大量无标签的言语添加成教师模型。软蒸馏是另一种受欢迎的KD方法, 蒸馏教师模型的输出日志。由于RNN-T的校正性质, 应用具有不同后传分布的RNNN-T结构之间的软蒸馏技术, 具有不同后传分布的软蒸馏技术是具有挑战性的。此外, 坏教师的高字色率(WER) 降低了KD 的功效。我们研究如何有效地提炼来自不同质量的ASR教师的知识, 而我们以前从未研究过这种知识。我们展示了一种序列级的KD, 完全蒸馏, 优于其他RNNNT-T模型的蒸馏方法, 特别是坏教师。我们还提出一个全色蒸馏的变式, 以强化教师的分级知识, 导致WER 的进一步改进。我们进行了关于公共数据系统Li和S-Setch 数据的实验, 即公共数据系统Listret-S-S-S-S-Sets。</s>

0

相关内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

基于AFM的自动化纳米焊接方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

新型GP抑制剂Qin74的结构优化及其肝靶向前药的研究

国家自然科学基金

0+阅读 · 2014年12月31日

信号稀疏表示与重构的神经网络算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

TRAIL/死亡受体信号调节凋亡在肢体远程缺血预处理抗肠缺血再灌注损伤中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于GPU的directionlets域SAR图像相干斑噪声抑制并行算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

结合反问题正则化思想及信息容量方法的南海海面风场多源数据的融合方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

风轮菜黄酮类成分调控Nrf2/ARE信号通路诱导Ⅱ相解毒酶抗心肌缺血再灌注损伤的分子机制及构效关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

大规模储能中混合型超级电容器单体失效特征参数研究

国家自然科学基金

0+阅读 · 2012年12月31日

适应纳米尺度CMOS集成电路DFM的ULTRA模型完善和偏差模拟技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

Ensemble Modeling with Contrastive Knowledge Distillation for Sequential Recommendation

Arxiv

0+阅读 · 2023年4月28日

A Generic Approach for Reproducible Model Distillation

Arxiv

0+阅读 · 2023年4月27日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

Knowledge-aware Graph Neural Networks with Label Smoothness Regularization for Recommendation

Arxiv

11+阅读 · 2019年6月13日

Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs

Arxiv

40+阅读 · 2019年6月4日

Knowledge Graph Convolutional Networks for Recommender Systems with Label Smoothness Regularization

Arxiv

21+阅读 · 2019年5月11日

A Memory-Network Based Solution for Multivariate Time-Series Forecasting

A Memory-Network Based Solution for Multivariate Time-Series Forecasting

Arxiv

13+阅读 · 2018年9月6日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

VIP会员

文章信息

相关主题

知识 (knowledge)

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

最大均方差正则化贝叶斯神经网络，Bayesian Neural Networks With Maximum Mean Discrepancy Regularization

专知会员服务

54+阅读 · 2020年3月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Ensemble Modeling with Contrastive Knowledge Distillation for Sequential Recommendation

Arxiv

0+阅读 · 2023年4月28日

A Generic Approach for Reproducible Model Distillation

Arxiv

0+阅读 · 2023年4月27日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Learning to Learn and Predict: A Meta-Learning Approach for Multi-Label Classification

Arxiv

17+阅读 · 2019年9月9日

Knowledge-aware Graph Neural Networks with Label Smoothness Regularization for Recommendation

Arxiv

11+阅读 · 2019年6月13日

Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs

Arxiv

40+阅读 · 2019年6月4日

Knowledge Graph Convolutional Networks for Recommender Systems with Label Smoothness Regularization

Arxiv

21+阅读 · 2019年5月11日

A Memory-Network Based Solution for Multivariate Time-Series Forecasting

A Memory-Network Based Solution for Multivariate Time-Series Forecasting

Arxiv

13+阅读 · 2018年9月6日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

相关基金

基于AFM的自动化纳米焊接方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

新型GP抑制剂Qin74的结构优化及其肝靶向前药的研究

国家自然科学基金

0+阅读 · 2014年12月31日

信号稀疏表示与重构的神经网络算法研究

国家自然科学基金

0+阅读 · 2014年12月31日

TRAIL/死亡受体信号调节凋亡在肢体远程缺血预处理抗肠缺血再灌注损伤中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于GPU的directionlets域SAR图像相干斑噪声抑制并行算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

结合反问题正则化思想及信息容量方法的南海海面风场多源数据的融合方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

风轮菜黄酮类成分调控Nrf2/ARE信号通路诱导Ⅱ相解毒酶抗心肌缺血再灌注损伤的分子机制及构效关系研究

国家自然科学基金

0+阅读 · 2012年12月31日

大规模储能中混合型超级电容器单体失效特征参数研究

国家自然科学基金

0+阅读 · 2012年12月31日

适应纳米尺度CMOS集成电路DFM的ULTRA模型完善和偏差模拟技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员