半在线知识蒸馏 (Semi-Online Knowledge Distillation) - 专知论文

会员服务 ·

0

蒸馏 · Student Networks · Extensibility · Networking · 教师网络 ·

2021 年 11 月 23 日

Semi-Online Knowledge Distillation

翻译：半在线知识蒸馏

Zhiqiang Liu,Yanxia Liu,Chengkai Huang

from arxiv, Accepted to BMVC2021

Knowledge distillation is an effective and stable method for model compression via knowledge transfer. Conventional knowledge distillation (KD) is to transfer knowledge from a large and well pre-trained teacher network to a small student network, which is a one-way process. Recently, deep mutual learning (DML) has been proposed to help student networks learn collaboratively and simultaneously. However, to the best of our knowledge, KD and DML have never been jointly explored in a unified framework to solve the knowledge distillation problem. In this paper, we investigate that the teacher model supports more trustworthy supervision signals in KD, while the student captures more similar behaviors from the teacher in DML. Based on these observations, we first propose to combine KD with DML in a unified framework. Furthermore, we propose a Semi-Online Knowledge Distillation (SOKD) method that effectively improves the performance of the student and the teacher. In this method, we introduce the peer-teaching training fashion in DML in order to alleviate the student's imitation difficulty, and also leverage the supervision signals provided by the well-trained teacher in KD. Besides, we also show our framework can be easily extended to feature-based distillation methods. Extensive experiments on CIFAR-100 and ImageNet datasets demonstrate the proposed method achieves state-of-the-art performance.

翻译：常规知识蒸馏(KD)是将知识从一个大型且训练有素的师资网络转移到一个小型学生网络,这是一个单向过程。最近,提出了深入的相互学习(DML),以帮助学生网络同时合作学习。然而,就我们的知识而言,KD和DML从未在统一框架内共同探索过解决知识蒸馏问题的方法。在本文中,我们调查教师模型支持KD中更值得信赖的监督信号,而学生则捕捉DML教师的类似行为。基于这些观察,我们首先建议在统一的框架内将KD和DML结合起来。此外,我们建议采用半Onal知识蒸馏(SOKD)方法,以有效提高学生和教师的绩效。在这种方法中,我们在DML中引入了同侪教学培训模式,以缓解学生的模仿困难,并利用KDML教师提供的监督信号。此外,我们还可以在KDML中进行良好培训的图像蒸馏过程中,我们还可以轻松地展示我们所提议采用的方法。

0

相关内容

【ACL2021】Weight Distillation：神经网络权重知识迁移方法

专知会员服务

21+阅读 · 2021年8月17日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

【论文推荐】层次知识图谱，Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks

【论文推荐】层次知识图谱，Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks

专知会员服务

49+阅读 · 2020年5月26日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

103+阅读 · 2020年4月25日

【知识迁移视觉识别综述论文】Knowledge Transfer in Vision Recognition: A Survey

【知识迁移视觉识别综述论文】Knowledge Transfer in Vision Recognition: A Survey

专知会员服务

30+阅读 · 2020年4月19日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

96+阅读 · 2020年3月25日

【Google AI论文】无妥协的弱监督解缠，Weakly-Supervised Disentanglement Without Compromises

【Google AI论文】无妥协的弱监督解缠，Weakly-Supervised Disentanglement Without Compromises

专知会员服务

20+阅读 · 2020年2月12日

【AAAI2020论文】小样本网络压缩，Few Shot Network Compression via Cross Distillation (附pdf）

专知会员服务

26+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Dynamic Rectification Knowledge Distillation

Arxiv

0+阅读 · 2022年1月27日

DualDE: Dually Distilling Knowledge Graph Embedding for Faster and Cheaper Reasoning

Arxiv

8+阅读 · 2021年12月13日

Boosting Contrastive Learning with Relation Knowledge Distillation

Arxiv

9+阅读 · 2021年12月8日

Instance-Conditional Knowledge Distillation for Object Detection

Arxiv

8+阅读 · 2021年10月25日

Topology Distillation for Recommender System

Arxiv

9+阅读 · 2021年6月16日

Extract the Knowledge of Graph Neural Networks and Go Beyond it: An Effective Knowledge Distillation Framework

Arxiv

9+阅读 · 2021年3月4日

Multilingual Knowledge Graph Completion via Ensemble Knowledge Transfer

Arxiv

4+阅读 · 2020年10月8日

已删除

Arxiv

32+阅读 · 2020年3月23日

Knowledge Distillation from Internal Representations

Knowledge Distillation from Internal Representations

Arxiv

4+阅读 · 2019年10月8日

Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification

Arxiv

3+阅读 · 2018年4月26日

VIP会员

文章信息

相关主题

Student Networks

相关VIP内容

【ACL2021】Weight Distillation：神经网络权重知识迁移方法

专知会员服务

21+阅读 · 2021年8月17日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

【论文推荐】层次知识图谱，Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks

【论文推荐】层次知识图谱，Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks

专知会员服务

49+阅读 · 2020年5月26日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

103+阅读 · 2020年4月25日

【知识迁移视觉识别综述论文】Knowledge Transfer in Vision Recognition: A Survey

【知识迁移视觉识别综述论文】Knowledge Transfer in Vision Recognition: A Survey

专知会员服务

30+阅读 · 2020年4月19日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

96+阅读 · 2020年3月25日

【Google AI论文】无妥协的弱监督解缠，Weakly-Supervised Disentanglement Without Compromises

【Google AI论文】无妥协的弱监督解缠，Weakly-Supervised Disentanglement Without Compromises

专知会员服务

20+阅读 · 2020年2月12日

【AAAI2020论文】小样本网络压缩，Few Shot Network Compression via Cross Distillation (附pdf）

专知会员服务

26+阅读 · 2019年11月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【ACMMM2025教程】打击网络虚假信息视频：特征分析、检测与防范，170页ppt

海军无人系统：海上作战的演进而非革命

Nature 子刊 | SciToolAgent:知识图谱引导的科学工具智能体

多媒体顶会ACM Multimedia 2025各大奖项揭晓！格拉斯哥大学等获最佳论文，中科院自动化所等获最佳学生论文

相关资讯

灾难性遗忘问题新视角：迁移-干扰平衡

灾难性遗忘问题新视角：迁移-干扰平衡

CreateAMind

17+阅读 · 2019年7月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

Dynamic Rectification Knowledge Distillation

Arxiv

0+阅读 · 2022年1月27日

DualDE: Dually Distilling Knowledge Graph Embedding for Faster and Cheaper Reasoning

Arxiv

8+阅读 · 2021年12月13日

Boosting Contrastive Learning with Relation Knowledge Distillation

Arxiv

9+阅读 · 2021年12月8日

Instance-Conditional Knowledge Distillation for Object Detection

Arxiv

8+阅读 · 2021年10月25日

Topology Distillation for Recommender System

Arxiv

9+阅读 · 2021年6月16日

Extract the Knowledge of Graph Neural Networks and Go Beyond it: An Effective Knowledge Distillation Framework

Arxiv

9+阅读 · 2021年3月4日

Multilingual Knowledge Graph Completion via Ensemble Knowledge Transfer

Arxiv

4+阅读 · 2020年10月8日

已删除

Arxiv

32+阅读 · 2020年3月23日

Knowledge Distillation from Internal Representations

Knowledge Distillation from Internal Representations

Arxiv

4+阅读 · 2019年10月8日

Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification

Arxiv

3+阅读 · 2018年4月26日

微信扫码咨询专知VIP会员