探索知识蒸馏和登录匹配之间的连接 (Exploring the Connection between Knowledge Distillation and Logits Matching) - 专知论文

会员服务 ·

0

对数几率 · 蒸馏 · 规范化 · Softmax · 正则化 ·

2021 年 9 月 14 日

Exploring the Connection between Knowledge Distillation and Logits Matching

翻译：探索知识蒸馏和登录匹配之间的连接

Defang Chen,Can Wang,Yan Feng,Chun Chen

from arxiv, Technical Report

Knowledge distillation is a generalized logits matching technique for model compression. Their equivalence is previously established on the condition of $\textit{infinity temperature}$ and $\textit{zero-mean normalization}$. In this paper, we prove that with only $\textit{infinity temperature}$, the effect of knowledge distillation equals to logits matching with an extra regularization. Furthermore, we reveal that an additional weaker condition -- $\textit{equal-mean initialization}$ rather than the original $\textit{zero-mean normalization}$ already suffices to set up the equivalence. The key to our proof is we realize that in modern neural networks with the cross-entropy loss and softmax activation, the mean of back-propagated gradient on logits always keeps zero.

翻译：知识蒸馏是一种用于模型压缩的通用逻辑匹配技术。其等值先前是在 $\ textit{ Infinity 温度} $ 和 $\ textit{ 零平均值正统} $ 的条件下确定的。在本文中, 我们证明, 只有 $\ textit{ infinity 温度} $, 知识蒸馏的效果等于 logits 匹配额外的正规化。此外, 我们发现, 额外的一个较弱条件 -- -- $\ textit{ equalization} $, 而不是原始的 $\ textit{ 零平均值正统} $ 已经足以建立等值。我们证据的关键在于我们认识到, 在现代神经网络中, 交叉天花损和软轴激活, 在日志上的反方向梯度梯值总是保持零。

0

相关内容

对数几率

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

最新《图神经网络知识图谱补全综述论文》A Survey on Graph Neural Networks for Knowledge Graph Completion

最新《图神经网络知识图谱补全综述论文》A Survey on Graph Neural Networks for Knowledge Graph Completion

专知会员服务

137+阅读 · 2020年7月29日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

96+阅读 · 2020年3月25日

【Freddy Lecue 博士|公开演讲】关于知识图谱在可解释AI中的作用（On The Role of Knowledge Graphs in Explainable AI.），International Semantic Web Conference 2019

【Freddy Lecue 博士|公开演讲】关于知识图谱在可解释AI中的作用（On The Role of Knowledge Graphs in Explainable AI.），International Semantic Web Conference 2019

专知会员服务

73+阅读 · 2019年11月11日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

计算机 | 中低难度国际会议信息8条

计算机 | 中低难度国际会议信息8条

Call4Papers

9+阅读 · 2019年6月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

人工智能 | NIPS 2019等国际会议信息8条

人工智能 | NIPS 2019等国际会议信息8条

Call4Papers

7+阅读 · 2019年3月21日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

ERROR: GLEW initalization error: Missing GL version

ERROR: GLEW initalization error: Missing GL version

深度强化学习实验室

9+阅读 · 2018年6月13日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Distributional Anchor Regression

Arxiv

0+阅读 · 2021年11月3日

Rethinking the Knowledge Distillation From the Perspective of Model Calibration

Arxiv

0+阅读 · 2021年11月3日

Beta Embeddings for Multi-Hop Logical Reasoning in Knowledge Graphs

Arxiv

9+阅读 · 2020年10月22日

TeMP: Temporal Message Passing for Temporal Knowledge Graph Completion

TeMP: Temporal Message Passing for Temporal Knowledge Graph Completion

Arxiv

9+阅读 · 2020年10月7日

The Knowledge Within: Methods for Data-Free Model Compression

The Knowledge Within: Methods for Data-Free Model Compression

Arxiv

3+阅读 · 2019年12月3日

Knowledge Distillation from Internal Representations

Knowledge Distillation from Internal Representations

Arxiv

4+阅读 · 2019年10月8日

Logic Attention Based Neighborhood Aggregation for Inductive Knowledge Graph Embedding

Arxiv

7+阅读 · 2018年11月4日

TTMF: A Triple Trustworthiness Measurement Frame for Knowledge Graphs

TTMF: A Triple Trustworthiness Measurement Frame for Knowledge Graphs

Arxiv

8+阅读 · 2018年9月25日

Evaluating and Understanding the Robustness of Adversarial Logit Pairing

Arxiv

8+阅读 · 2018年7月26日

From Knowledge Graph Embedding to Ontology Embedding: Region Based Representations of Relational Structures

Arxiv

10+阅读 · 2018年5月26日

VIP会员

文章信息

相关主题

相关VIP内容

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

最新《图神经网络知识图谱补全综述论文》A Survey on Graph Neural Networks for Knowledge Graph Completion

最新《图神经网络知识图谱补全综述论文》A Survey on Graph Neural Networks for Knowledge Graph Completion

专知会员服务

137+阅读 · 2020年7月29日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

96+阅读 · 2020年3月25日

【Freddy Lecue 博士|公开演讲】关于知识图谱在可解释AI中的作用（On The Role of Knowledge Graphs in Explainable AI.），International Semantic Web Conference 2019

【Freddy Lecue 博士|公开演讲】关于知识图谱在可解释AI中的作用（On The Role of Knowledge Graphs in Explainable AI.），International Semantic Web Conference 2019

专知会员服务

73+阅读 · 2019年11月11日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

模型提取攻击与防御的系统综述：最新进展与展望

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

【CMU博士论文】用于物理模拟的高效深度学习模型

大模型解决方案白皮书：社交陪伴场景全流程落地指南

相关资讯

计算机 | 中低难度国际会议信息8条

计算机 | 中低难度国际会议信息8条

Call4Papers

9+阅读 · 2019年6月19日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

人工智能 | NIPS 2019等国际会议信息8条

人工智能 | NIPS 2019等国际会议信息8条

Call4Papers

7+阅读 · 2019年3月21日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

ERROR: GLEW initalization error: Missing GL version

ERROR: GLEW initalization error: Missing GL version

深度强化学习实验室

9+阅读 · 2018年6月13日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Distributional Anchor Regression

Arxiv

0+阅读 · 2021年11月3日

Rethinking the Knowledge Distillation From the Perspective of Model Calibration

Arxiv

0+阅读 · 2021年11月3日

Beta Embeddings for Multi-Hop Logical Reasoning in Knowledge Graphs

Arxiv

9+阅读 · 2020年10月22日

TeMP: Temporal Message Passing for Temporal Knowledge Graph Completion

TeMP: Temporal Message Passing for Temporal Knowledge Graph Completion

Arxiv

9+阅读 · 2020年10月7日

The Knowledge Within: Methods for Data-Free Model Compression

The Knowledge Within: Methods for Data-Free Model Compression

Arxiv

3+阅读 · 2019年12月3日

Knowledge Distillation from Internal Representations

Knowledge Distillation from Internal Representations

Arxiv

4+阅读 · 2019年10月8日

Logic Attention Based Neighborhood Aggregation for Inductive Knowledge Graph Embedding

Arxiv

7+阅读 · 2018年11月4日

TTMF: A Triple Trustworthiness Measurement Frame for Knowledge Graphs

TTMF: A Triple Trustworthiness Measurement Frame for Knowledge Graphs

Arxiv

8+阅读 · 2018年9月25日

Evaluating and Understanding the Robustness of Adversarial Logit Pairing

Arxiv

8+阅读 · 2018年7月26日

From Knowledge Graph Embedding to Ontology Embedding: Region Based Representations of Relational Structures

Arxiv

10+阅读 · 2018年5月26日

微信扫码咨询专知VIP会员