连教师需要指导: 由自我蒸馏引起的地面真相目标 (Even your Teacher Needs Guidance: Ground-Truth Targets Dampen Regularization Imposed by Self-Distillation) - 专知论文

会员服务 ·

0

Weight · 正则化项 · Guidance · 泛函 · 蒸馏 ·

2021 年 10 月 15 日

Even your Teacher Needs Guidance: Ground-Truth Targets Dampen Regularization Imposed by Self-Distillation

翻译：连教师需要指导: 由自我蒸馏引起的地面真相目标

Kenneth Borup,Lars N. Andersen

from arxiv, To be published at NeurIPS 2021; 21 pages, 14 figures

Knowledge distillation is classically a procedure where a neural network is trained on the output of another network along with the original targets in order to transfer knowledge between the architectures. The special case of self-distillation, where the network architectures are identical, has been observed to improve generalization accuracy. In this paper, we consider an iterative variant of self-distillation in a kernel regression setting, in which successive steps incorporate both model outputs and the ground-truth targets. This allows us to provide the first theoretical results on the importance of using the weighted ground-truth targets in self-distillation. Our focus is on fitting nonlinear functions to training data with a weighted mean square error objective function suitable for distillation, subject to $\ell_2$ regularization of the model parameters. We show that any such function obtained with self-distillation can be calculated directly as a function of the initial fit, and that infinite distillation steps yields the same optimization problem as the original with amplified regularization. Furthermore, we provide a closed form solution for the optimal choice of weighting parameter at each step, and show how to efficiently estimate this weighting parameter for deep learning and significantly reduce the computational requirements compared to a grid search.

翻译：知识蒸馏是典型的一种程序, 神经网络在另一个网络的产出和原始目标上接受培训, 以在结构间转让知识。自蒸馏的特殊情况, 即网络结构相同, 已经观察到自我蒸馏的特殊情况, 以提高一般化准确性。在本文中, 我们考虑在内核回归环境中自我蒸馏的迭代变式, 连续步骤既包含模型输出, 也包含地面真实目标。这使我们能够提供关于在自我蒸馏中使用加权地真象目标的重要性的初步理论结果。我们的重点是将非线性功能与适合蒸馏的加权平均平方差客观功能相匹配, 但须遵守 $\ ell_ 2$ 的模型参数正规化。我们表明, 任何通过自我蒸馏获得的这种函数都可以直接计算为初始适配函数的函数, 并且无限的蒸馏步骤产生与原始的放大正统化的优化问题。此外, 我们为每个步骤最优选择加权参数提供了封闭的形式解决方案, 并显示如何有效地计算这一深度的搜索参数。

0

相关内容

Weight

NeurIPS2021 | Cycle Self-Training：领域自适应的循环自训练方法与理论

NeurIPS2021 | Cycle Self-Training：领域自适应的循环自训练方法与理论

专知会员服务

20+阅读 · 2021年11月13日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【实用书】数据科学基础，484页pdf，Foundations of Data Science

【实用书】数据科学基础，484页pdf，Foundations of Data Science

专知会员服务

122+阅读 · 2020年5月28日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Calibration Improves Bayesian Optimization

Arxiv

0+阅读 · 2021年12月8日

Inference on a Distribution from Noisy Draws

Arxiv

0+阅读 · 2021年12月6日

Explaining generalization in deep learning: progress and fundamental limits

Arxiv

10+阅读 · 2021年10月17日

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

Arxiv

4+阅读 · 2021年7月5日

Dissecting Supervised Constrastive Learning

Arxiv

11+阅读 · 2021年2月17日

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

Arxiv

9+阅读 · 2021年2月8日

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Arxiv

4+阅读 · 2020年10月20日

Kernel Based Progressive Distillation for Adder Neural Networks

Arxiv

5+阅读 · 2020年9月29日

Self-training with Noisy Student improves ImageNet classification

Arxiv

15+阅读 · 2019年11月11日

Invariant Information Distillation for Unsupervised Image Segmentation and Clustering

Invariant Information Distillation for Unsupervised Image Segmentation and Clustering

Arxiv

5+阅读 · 2018年7月21日

VIP会员

文章信息

相关主题

相关VIP内容

NeurIPS2021 | Cycle Self-Training：领域自适应的循环自训练方法与理论

NeurIPS2021 | Cycle Self-Training：领域自适应的循环自训练方法与理论

专知会员服务

20+阅读 · 2021年11月13日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【实用书】数据科学基础，484页pdf，Foundations of Data Science

【实用书】数据科学基础，484页pdf，Foundations of Data Science

专知会员服务

122+阅读 · 2020年5月28日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Calibration Improves Bayesian Optimization

Arxiv

0+阅读 · 2021年12月8日

Inference on a Distribution from Noisy Draws

Arxiv

0+阅读 · 2021年12月6日

Explaining generalization in deep learning: progress and fundamental limits

Arxiv

10+阅读 · 2021年10月17日

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

Arxiv

4+阅读 · 2021年7月5日

Dissecting Supervised Constrastive Learning

Arxiv

11+阅读 · 2021年2月17日

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

Arxiv

9+阅读 · 2021年2月8日

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Arxiv

4+阅读 · 2020年10月20日

Kernel Based Progressive Distillation for Adder Neural Networks

Arxiv

5+阅读 · 2020年9月29日

Self-training with Noisy Student improves ImageNet classification

Arxiv

15+阅读 · 2019年11月11日

Invariant Information Distillation for Unsupervised Image Segmentation and Clustering

Invariant Information Distillation for Unsupervised Image Segmentation and Clustering

Arxiv

5+阅读 · 2018年7月21日

微信扫码咨询专知VIP会员