从语言模式到声学模式的知识蒸馏:一种等级分级的多任务学习方法 (Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach) - 专知论文

会员服务 ·

0

蒸馏 · 语言模型化 · MoDELS · Performer · 语音识别 ·

2021 年 10 月 20 日

Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach

翻译：从语言模式到声学模式的知识蒸馏:一种等级分级的多任务学习方法

Mun-Hak Lee,Joon-Hyuk Chang

from arxiv, 4page + 1page for citation + 2 pages for appendix

The remarkable performance of the pre-trained language model (LM) using self-supervised learning has led to a major paradigm shift in the study of natural language processing. In line with these changes, leveraging the performance of speech recognition systems with massive deep learning-based LMs is a major topic of speech recognition research. Among the various methods of applying LMs to speech recognition systems, in this paper, we focus on a cross-modal knowledge distillation method that transfers knowledge between two types of deep neural networks with different modalities. We propose an acoustic model structure with multiple auxiliary output layers for cross-modal distillation and demonstrate that the proposed method effectively compensates for the shortcomings of the existing label-interpolation-based distillation method. In addition, we extend the proposed method to a hierarchical distillation method using LMs trained in different units (senones, monophones, and subwords) and reveal the effectiveness of the hierarchical distillation method through an ablation study.

翻译：通过自我监督的学习,培训前语言模式(LM)的出色表现导致自然语言处理研究的范式发生重大转变。根据这些变化,利用大量深层学习的LM来发挥语音识别系统的功能,这是语音识别研究的一个主要课题。在对语音识别系统应用LMS的各种方法中,我们在本文件中侧重于一种跨模式知识蒸馏方法,在两种具有不同模式的深层神经网络之间转让知识。我们提出了一个具有多种辅助输出层的声学模型结构,供跨现代蒸馏使用,并表明拟议方法有效地弥补了现有基于标签的蒸馏方法的缺陷。此外,我们将拟议方法推广到使用在不同单元(人脑、单声和子词)培训的LMS的等级蒸馏方法,并通过一项烧蚀研究揭示了等级蒸馏方法的有效性。

0

相关内容

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

102+阅读 · 2020年4月25日

【CVPR2020】视频符号语言识别中跨领域知识的传递, Transferring Cross-domain Knowledge for Video Sign Language Recognition

【CVPR2020】视频符号语言识别中跨领域知识的传递, Transferring Cross-domain Knowledge for Video Sign Language Recognition

专知会员服务

9+阅读 · 2020年4月17日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

96+阅读 · 2020年3月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

已删除

将门创投

3+阅读 · 2018年8月21日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

A Survey of Knowledge Enhanced Pre-trained Models

Arxiv

28+阅读 · 2021年10月1日

A Survey on Graph Neural Networks for Knowledge Graph Completion

Arxiv

6+阅读 · 2020年7月24日

已删除

Arxiv

32+阅读 · 2020年3月23日

Pre-trained Models for Natural Language Processing: A Survey

Arxiv

113+阅读 · 2020年3月18日

Multi-Scale Self-Attention for Text Classification

Arxiv

4+阅读 · 2019年12月2日

Knowledge Distillation from Internal Representations

Knowledge Distillation from Internal Representations

Arxiv

4+阅读 · 2019年10月8日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Arxiv

21+阅读 · 2019年3月27日

Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification

Arxiv

3+阅读 · 2018年4月26日

A Survey on Multi-Task Learning

Arxiv

5+阅读 · 2017年7月25日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

102+阅读 · 2020年4月25日

【CVPR2020】视频符号语言识别中跨领域知识的传递, Transferring Cross-domain Knowledge for Video Sign Language Recognition

【CVPR2020】视频符号语言识别中跨领域知识的传递, Transferring Cross-domain Knowledge for Video Sign Language Recognition

专知会员服务

9+阅读 · 2020年4月17日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

96+阅读 · 2020年3月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《自适应训练辅助系统概念导论及其在空战指挥官加速培训中的应用》125页

《美陆军近战整合企业现代化计划（2025—2026）》最新报告

以色列-伊朗空战：短暂而激烈冲突的启示

《动态作战支援演习框架构建》80页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

已删除

将门创投

3+阅读 · 2018年8月21日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

A Survey of Knowledge Enhanced Pre-trained Models

Arxiv

28+阅读 · 2021年10月1日

A Survey on Graph Neural Networks for Knowledge Graph Completion

Arxiv

6+阅读 · 2020年7月24日

已删除

Arxiv

32+阅读 · 2020年3月23日

Pre-trained Models for Natural Language Processing: A Survey

Arxiv

113+阅读 · 2020年3月18日

Multi-Scale Self-Attention for Text Classification

Arxiv

4+阅读 · 2019年12月2日

Knowledge Distillation from Internal Representations

Knowledge Distillation from Internal Representations

Arxiv

4+阅读 · 2019年10月8日

Language Models as Knowledge Bases?

Arxiv

6+阅读 · 2019年9月4日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Arxiv

21+阅读 · 2019年3月27日

Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification

Arxiv

3+阅读 · 2018年4月26日

A Survey on Multi-Task Learning

Arxiv

5+阅读 · 2017年7月25日

微信扫码咨询专知VIP会员