逐步完善目标并逐步完善自我知识蒸馏 (Self-Knowledge Distillation with Progressive Refinement of Targets) - 专知论文

会员服务 ·

0

硬目标 · 蒸馏 · 正则化项 · 泛化理论 · Extensibility ·

2021 年 5 月 1 日

Self-Knowledge Distillation with Progressive Refinement of Targets

翻译：逐步完善目标并逐步完善自我知识蒸馏

Kyungyul Kim,ByeongMoon Ji,Doyoung Yoon,Sangheum Hwang

from arxiv, Under review

The generalization capability of deep neural networks has been substantially improved by applying a wide spectrum of regularization methods, e.g., restricting function space, injecting randomness during training, augmenting data, etc. In this work, we propose a simple yet effective regularization method named \textit{progressive self-knowledge distillation} (PS-KD), which progressively distills a model's own knowledge to soften hard targets (i.e., one-hot vectors) during training. Hence, it can be interpreted within a framework of knowledge distillation as a student becomes a teacher itself. Specifically, targets are adjusted adaptively by combining the ground-truth and past predictions from the model itself. We show that PS-KD provides an effect of hard example mining by rescaling gradients according to difficulty in classifying examples. The proposed method is applicable to any supervised learning tasks with hard targets and can be easily combined with existing regularization methods to further enhance the generalization performance. Furthermore, it is confirmed that PS-KD achieves not only better accuracy, but also provides high quality of confidence estimates in terms of calibration as well as ordinal ranking. Extensive experimental results on three different tasks, image classification, object detection, and machine translation, demonstrate that our method consistently improves the performance of the state-of-the-art baselines. The code will be released.

翻译：通过应用广泛的正规化方法,例如限制功能空间、在培训期间注射随机性、增加数据等等,深神经网络的普遍化能力已大为改善。在这项工作中,我们提议了一个简单而有效的正规化方法,名为\textit{逐步自我知识蒸馏}(PS-KD),该方法逐步提炼出模型本身的知识,以在培训期间软化硬目标(即一热矢量),因此可以在知识蒸馏框架内解释,因为学生本身成为教师。具体地说,通过将模型本身的地面真实性和以往预测结合起来来调整目标。我们表明,PS-KD通过调整梯度以调整梯度来提供硬性实例挖掘效应,以难于对实例进行分类。拟议方法适用于任何具有硬目标的监督性学习任务,并且很容易与现有的正规化方法相结合,以进一步提高总体化绩效。此外,PS-KD不仅实现了更高的准确性,而且还提供了高质量的信心估算,通过将地面和过去的预测结果结合起来。我们表明,PS-KD提供了硬性实例的挖掘效果,作为不断升级的模型的升级和升级,作为不断改进的排序。

0

相关内容

硬目标

知识图谱嵌入模型的概率标定,Probability Calibration for Knowledge Graph Embedding Models

专知会员服务

36+阅读 · 2020年5月11日

【CVPR2020-浙江大学-阿里巴巴】深层知识迁移的深层归因图，DEPARA: Deep Attribution Graph for Deep Knowledge Transferability

【CVPR2020-浙江大学-阿里巴巴】深层知识迁移的深层归因图，DEPARA: Deep Attribution Graph for Deep Knowledge Transferability

专知会员服务

29+阅读 · 2020年4月17日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

96+阅读 · 2020年3月25日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

【百度】上下文化知识图谱嵌入，CoKE: Contextualized Knowledge Graph Embedding

【百度】上下文化知识图谱嵌入，CoKE: Contextualized Knowledge Graph Embedding

专知会员服务

80+阅读 · 2019年11月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Sparsistent Model Discovery

Sparsistent Model Discovery

Arxiv

0+阅读 · 2021年6月22日

Learning to Inference with Early Exit in the Progressive Speech Enhancement

Learning to Inference with Early Exit in the Progressive Speech Enhancement

Arxiv

0+阅读 · 2021年6月22日

A Coarse-to-Fine Instance Segmentation Network with Learning Boundary Representation

A Coarse-to-Fine Instance Segmentation Network with Learning Boundary Representation

Arxiv

0+阅读 · 2021年6月18日

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Arxiv

4+阅读 · 2020年10月20日

Kernel Based Progressive Distillation for Adder Neural Networks

Arxiv

5+阅读 · 2020年9月29日

Knowledge Distillation from Internal Representations

Knowledge Distillation from Internal Representations

Arxiv

4+阅读 · 2019年10月8日

SCEF: A Support-Confidence-aware Embedding Framework for Knowledge Graph Refinement

SCEF: A Support-Confidence-aware Embedding Framework for Knowledge Graph Refinement

Arxiv

7+阅读 · 2019年2月18日

Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification

Arxiv

3+阅读 · 2018年4月26日

Learning a Deep Listwise Context Model for Ranking Refinement

Arxiv

4+阅读 · 2018年4月23日

Neural Response Generation with Dynamic Vocabularies

Arxiv

5+阅读 · 2017年11月30日

VIP会员

文章信息

相关主题

相关VIP内容

知识图谱嵌入模型的概率标定,Probability Calibration for Knowledge Graph Embedding Models

专知会员服务

36+阅读 · 2020年5月11日

【CVPR2020-浙江大学-阿里巴巴】深层知识迁移的深层归因图，DEPARA: Deep Attribution Graph for Deep Knowledge Transferability

【CVPR2020-浙江大学-阿里巴巴】深层知识迁移的深层归因图，DEPARA: Deep Attribution Graph for Deep Knowledge Transferability

专知会员服务

29+阅读 · 2020年4月17日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

96+阅读 · 2020年3月25日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

【百度】上下文化知识图谱嵌入，CoKE: Contextualized Knowledge Graph Embedding

【百度】上下文化知识图谱嵌入，CoKE: Contextualized Knowledge Graph Embedding

专知会员服务

80+阅读 · 2019年11月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Sparsistent Model Discovery

Sparsistent Model Discovery

Arxiv

0+阅读 · 2021年6月22日

Learning to Inference with Early Exit in the Progressive Speech Enhancement

Learning to Inference with Early Exit in the Progressive Speech Enhancement

Arxiv

0+阅读 · 2021年6月22日

A Coarse-to-Fine Instance Segmentation Network with Learning Boundary Representation

A Coarse-to-Fine Instance Segmentation Network with Learning Boundary Representation

Arxiv

0+阅读 · 2021年6月18日

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Arxiv

4+阅读 · 2020年10月20日

Kernel Based Progressive Distillation for Adder Neural Networks

Arxiv

5+阅读 · 2020年9月29日

Knowledge Distillation from Internal Representations

Knowledge Distillation from Internal Representations

Arxiv

4+阅读 · 2019年10月8日

SCEF: A Support-Confidence-aware Embedding Framework for Knowledge Graph Refinement

SCEF: A Support-Confidence-aware Embedding Framework for Knowledge Graph Refinement

Arxiv

7+阅读 · 2019年2月18日

Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification

Arxiv

3+阅读 · 2018年4月26日

Learning a Deep Listwise Context Model for Ranking Refinement

Arxiv

4+阅读 · 2018年4月23日

Neural Response Generation with Dynamic Vocabularies

Arxiv

5+阅读 · 2017年11月30日

微信扫码咨询专知VIP会员