了解知识蒸馏 (Towards Understanding Knowledge Distillation) - 专知论文

会员服务 ·

0

蒸馏 · 期望风险 · 线性的 · 可理解性 · 优化器 ·

2021 年 5 月 27 日

Towards Understanding Knowledge Distillation

翻译：了解知识蒸馏

Mary Phuong,Christoph H. Lampert

from arxiv, ICML'19. Post-edited to add related work. arXiv admin note: text overlap with arXiv:2003.13438 by other authors

Knowledge distillation, i.e., one classifier being trained on the outputs of another classifier, is an empirically very successful technique for knowledge transfer between classifiers. It has even been observed that classifiers learn much faster and more reliably if trained with the outputs of another classifier as soft labels, instead of from ground truth data. So far, however, there is no satisfactory theoretical explanation of this phenomenon. In this work, we provide the first insights into the working mechanisms of distillation by studying the special case of linear and deep linear classifiers. Specifically, we prove a generalization bound that establishes fast convergence of the expected risk of a distillation-trained linear classifier. From the bound and its proof we extract three key factors that determine the success of distillation: * data geometry -- geometric properties of the data distribution, in particular class separation, has a direct influence on the convergence speed of the risk; * optimization bias -- gradient descent optimization finds a very favorable minimum of the distillation objective; and * strong monotonicity -- the expected risk of the student classifier always decreases when the size of the training set grows.

翻译：一种知识蒸馏法,即正在用另一个分类器的输出物进行知识蒸馏法培训的一个分类器,是一种经验上非常成功的分类器之间知识转让技术。甚至有人观察到,如果用另一个分类器的输出物作为软标签而不是地面真实数据进行培训,分类器的学习速度更快、更可靠。然而,迄今为止,对于这一现象还没有令人满意的理论解释。在这项工作中,我们通过研究线性和深度线性分类器的特殊案例,对蒸馏工作机制提供了初步的洞察力。具体地说,我们证明,我们有一个概括性约束,可以迅速将蒸馏训练过的线性分类器的预期风险集中起来。从这一界限及其证据中我们提取了三个关键因素,决定蒸馏成功与否:* 数据几何测量法 -- -- 数据分布的几何特性,特别是等级分解,对风险的趋同速度有直接影响;* 优化偏差 -- 梯度下降最有利于蒸馏目标;以及* 强大的单调度 -- -- 预计学生分类器在训练场变大时总是会降低风险。

0

相关内容

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

【EMNLP2020-清华】基于常识知识图谱的多跳推理语言生成

【EMNLP2020-清华】基于常识知识图谱的多跳推理语言生成

专知会员服务

74+阅读 · 2020年9月25日

【KDD2020】更深的图神经网络，Towards Deeper Graph Neural Networks

【KDD2020】更深的图神经网络，Towards Deeper Graph Neural Networks

专知会员服务

90+阅读 · 2020年7月22日

【快讯】ECCV 2020论文出炉，1361篇上榜，你的paper中了吗？

【快讯】ECCV 2020论文出炉，1361篇上榜，你的paper中了吗？

专知会员服务

57+阅读 · 2020年7月3日

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

专知会员服务

195+阅读 · 2020年5月31日

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

专知会员服务

26+阅读 · 2020年5月6日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

101+阅读 · 2020年4月25日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

95+阅读 · 2020年3月25日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

155+阅读 · 2019年10月12日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

BERT 瘦身之路：Distillation，Quantization，Pruning

BERT 瘦身之路：Distillation，Quantization，Pruning

AINLP

10+阅读 · 2019年10月22日

意识是一种数学模式

意识是一种数学模式

CreateAMind

3+阅读 · 2019年6月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

26+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

28+阅读 · 2019年5月18日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Towards a Dimension-Free Understanding of Adaptive Linear Control

Arxiv

0+阅读 · 2021年7月15日

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

Arxiv

6+阅读 · 2020年12月14日

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Arxiv

15+阅读 · 2020年12月3日

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Arxiv

4+阅读 · 2020年10月20日

Teacher-Student Training for Robust Tacotron-based TTS

Teacher-Student Training for Robust Tacotron-based TTS

Arxiv

5+阅读 · 2019年11月7日

Contrastive Representation Distillation

Contrastive Representation Distillation

Arxiv

5+阅读 · 2019年10月23日

Knowledge Distillation from Internal Representations

Knowledge Distillation from Internal Representations

Arxiv

4+阅读 · 2019年10月8日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

K-BERT: Enabling Language Representation with Knowledge Graph

K-BERT: Enabling Language Representation with Knowledge Graph

Arxiv

19+阅读 · 2019年9月17日

Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation

Arxiv

3+阅读 · 2017年8月3日

VIP会员

文章信息

相关主题

相关VIP内容

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

【EMNLP2020-清华】基于常识知识图谱的多跳推理语言生成

【EMNLP2020-清华】基于常识知识图谱的多跳推理语言生成

专知会员服务

74+阅读 · 2020年9月25日

【KDD2020】更深的图神经网络，Towards Deeper Graph Neural Networks

【KDD2020】更深的图神经网络，Towards Deeper Graph Neural Networks

专知会员服务

90+阅读 · 2020年7月22日

【快讯】ECCV 2020论文出炉，1361篇上榜，你的paper中了吗？

【快讯】ECCV 2020论文出炉，1361篇上榜，你的paper中了吗？

专知会员服务

57+阅读 · 2020年7月3日

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

专知会员服务

195+阅读 · 2020年5月31日

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

专知会员服务

26+阅读 · 2020年5月6日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

101+阅读 · 2020年4月25日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

95+阅读 · 2020年3月25日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

155+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

《新兴技术与风险分析：太空领域与关键基础设施》最新报告

《面向无人机实时认知任务解决的视觉-语言-动作（VLA）模型与评估基准》

《导弹干扰和交战模型》最新34页报告

《重建美国核武库的路线图》最新报告

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

BERT 瘦身之路：Distillation，Quantization，Pruning

BERT 瘦身之路：Distillation，Quantization，Pruning

AINLP

10+阅读 · 2019年10月22日

意识是一种数学模式

意识是一种数学模式

CreateAMind

3+阅读 · 2019年6月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

26+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

28+阅读 · 2019年5月18日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Towards a Dimension-Free Understanding of Adaptive Linear Control

Arxiv

0+阅读 · 2021年7月15日

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

Arxiv

6+阅读 · 2020年12月14日

Attribute-Guided Adversarial Training for Robustness to Natural Perturbations

Arxiv

15+阅读 · 2020年12月3日

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Arxiv

4+阅读 · 2020年10月20日

Teacher-Student Training for Robust Tacotron-based TTS

Teacher-Student Training for Robust Tacotron-based TTS

Arxiv

5+阅读 · 2019年11月7日

Contrastive Representation Distillation

Contrastive Representation Distillation

Arxiv

5+阅读 · 2019年10月23日

Knowledge Distillation from Internal Representations

Knowledge Distillation from Internal Representations

Arxiv

4+阅读 · 2019年10月8日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

K-BERT: Enabling Language Representation with Knowledge Graph

K-BERT: Enabling Language Representation with Knowledge Graph

Arxiv

19+阅读 · 2019年9月17日

Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation

Arxiv

3+阅读 · 2017年8月3日

微信扫码咨询专知VIP会员