学生帮助教师:通过自我知识蒸馏的教师演变 (Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation) - 专知论文

会员服务 ·

0

Student Networks · 蒸馏 · Networking · Extensibility · ImageNet (数据集) ·

2021 年 10 月 1 日

Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation

翻译：学生帮助教师:通过自我知识蒸馏的教师演变

Zheng Li,Xiang Li,Lingfeng Zhang,Jian Yang,Zhigeng Pan

Knowledge distillation usually transfers the knowledge from a pre-trained cumbersome teacher network to a compact student network, which follows the classical teacher-teaching-student paradigm. Based on this paradigm, previous methods mostly focus on how to efficiently train a better student network for deployment. Different from the existing practices, in this paper, we propose a novel student-helping-teacher formula, Teacher Evolution via Self-Knowledge Distillation (TESKD), where the target teacher (for deployment) is learned with the help of multiple hierarchical students by sharing the structural backbone. The diverse feedback from multiple students allows the teacher to improve itself through the shared feature representations. The effectiveness of our proposed framework is demonstrated by extensive experiments with various network settings on two standard benchmarks including CIFAR-100 and ImageNet. Notably, when trained together with our proposed method, ResNet-18 achieves 79.15% and 71.14% accuracy on CIFAR-100 and ImageNet, outperforming the baseline results by 4.74% and 1.43%, respectively. The code is available at: https://github.com/zhengli427/TESKD.

翻译：知识蒸馏通常将知识从经过培训的烦琐教师网络转移到紧凑的学生网络,这遵循了古典教师-教师-学生模式。基于这一范例,以往方法主要侧重于如何有效培训更好的学生网络,以便部署。与现有做法不同,我们在本文件中提出了新的学生-助学-教师公式,即通过自学蒸馏的教师演变(TESKD),通过共享结构骨干,在多等级学生的帮助下学习了目标教师(供部署)。来自多个学生的反馈使得教师能够通过共享特征表达方式自我改进。我们拟议框架的有效性通过在两个标准基准基准上对各种网络进行的广泛实验得到证明,包括CIFAR-100和图像网络。值得注意的是,ResNet-18与我们拟议的方法一起培训后,在CIFAR-100和图像网络上实现了79.15%和71.14%的准确率,分别比基准结果高出4.74%和1.43%。代码见https://github.com/zheng427/TESKD。

0

相关内容

Student Networks

Student Networks

【ICML2021】轻量级结构多样化的网络结构

专知会员服务

28+阅读 · 2021年8月2日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【AAAI2021】LRC-BERT：对比学习潜在语义知识蒸馏的自然语言理解

专知会员服务

27+阅读 · 2020年12月31日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

103+阅读 · 2020年4月25日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

96+阅读 · 2020年3月25日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

Focal and Global Knowledge Distillation for Detectors

Arxiv

0+阅读 · 2021年11月23日

Semi-Online Knowledge Distillation

Arxiv

0+阅读 · 2021年11月23日

Topology Distillation for Recommender System

Arxiv

9+阅读 · 2021年6月16日

Distilling Audio-Visual Knowledge by Compositional Contrastive Learning

Arxiv

5+阅读 · 2021年4月22日

Progressive Network Grafting for Few-Shot Knowledge Distillation

Progressive Network Grafting for Few-Shot Knowledge Distillation

Arxiv

4+阅读 · 2020年12月9日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

Self-Supervised Learning For Few-Shot Image Classification

Self-Supervised Learning For Few-Shot Image Classification

Arxiv

19+阅读 · 2019年11月14日

Knowledge Distillation from Internal Representations

Knowledge Distillation from Internal Representations

Arxiv

4+阅读 · 2019年10月8日

Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification

Arxiv

3+阅读 · 2018年4月26日

Net2Net: Accelerating Learning via Knowledge Transfer

Arxiv

3+阅读 · 2016年4月23日

VIP会员

文章信息

相关主题

Student Networks

ImageNet (数据集)

相关VIP内容

【ICML2021】轻量级结构多样化的网络结构

专知会员服务

28+阅读 · 2021年8月2日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【AAAI2021】LRC-BERT：对比学习潜在语义知识蒸馏的自然语言理解

专知会员服务

27+阅读 · 2020年12月31日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

103+阅读 · 2020年4月25日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

96+阅读 · 2020年3月25日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《在单一作战合成环境（SSE）中运用人工智能与大型语言模型以提供灵活人文地形及可信角色组》报告

《俄罗斯的未来战争方式第二部分：核威慑》报告

《提示战争：大语言模型如何决定军事干预》报告

《俄罗斯的未来战争方式第三部分：军事改革》报告

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

Focal and Global Knowledge Distillation for Detectors

Arxiv

0+阅读 · 2021年11月23日

Semi-Online Knowledge Distillation

Arxiv

0+阅读 · 2021年11月23日

Topology Distillation for Recommender System

Arxiv

9+阅读 · 2021年6月16日

Distilling Audio-Visual Knowledge by Compositional Contrastive Learning

Arxiv

5+阅读 · 2021年4月22日

Progressive Network Grafting for Few-Shot Knowledge Distillation

Progressive Network Grafting for Few-Shot Knowledge Distillation

Arxiv

4+阅读 · 2020年12月9日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

Self-Supervised Learning For Few-Shot Image Classification

Self-Supervised Learning For Few-Shot Image Classification

Arxiv

19+阅读 · 2019年11月14日

Knowledge Distillation from Internal Representations

Knowledge Distillation from Internal Representations

Arxiv

4+阅读 · 2019年10月8日

Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification

Arxiv

3+阅读 · 2018年4月26日

Net2Net: Accelerating Learning via Knowledge Transfer

Arxiv

3+阅读 · 2016年4月23日

微信扫码咨询专知VIP会员