Lipschitz Lipschitz 连续性指导知识蒸馏 (Lipschitz Continuity Guided Knowledge Distillation) - 专知论文

会员服务 ·

0

Lipschitz · Lipschitz连续 · Continuity · 蒸馏 · 教师网络 ·

2021 年 8 月 29 日

Lipschitz Continuity Guided Knowledge Distillation

翻译：Lipschitz Lipschitz 连续性指导知识蒸馏

Yuzhang Shang,Bin Duan,Ziliang Zong,Liqiang Nie,Yan Yan

from arxiv, This work has been accepted by ICCV 2021

Knowledge distillation has become one of the most important model compression techniques by distilling knowledge from larger teacher networks to smaller student ones. Although great success has been achieved by prior distillation methods via delicately designing various types of knowledge, they overlook the functional properties of neural networks, which makes the process of applying those techniques to new tasks unreliable and non-trivial. To alleviate such problem, in this paper, we initially leverage Lipschitz continuity to better represent the functional characteristic of neural networks and guide the knowledge distillation process. In particular, we propose a novel Lipschitz Continuity Guided Knowledge Distillation framework to faithfully distill knowledge by minimizing the distance between two neural networks' Lipschitz constants, which enables teacher networks to better regularize student networks and improve the corresponding performance. We derive an explainable approximation algorithm with an explicit theoretical derivation to address the NP-hard problem of calculating the Lipschitz constant. Experimental results have shown that our method outperforms other benchmarks over several knowledge distillation tasks (e.g., classification, segmentation and object detection) on CIFAR-100, ImageNet, and PASCAL VOC datasets.

翻译：通过从较大的教师网络向较小的学生网络提取知识,知识蒸馏已成为最重要的模型压缩技术之一。尽管通过精细设计各种知识的精细设计,先前的蒸馏方法取得了巨大成功,但它们忽视了神经网络的功能特性,这使得将这些技术应用于新的任务的过程不可靠和非三边性。为了缓解这一问题,我们在本文件中利用Lipschitz连续性更好地代表神经网络的功能特征和指导知识蒸馏过程。特别是,我们提议了一个新的Lipschitz连续性引导知识蒸馏框架,通过最大限度地减少两个神经网络的Lipschitz常数之间的距离,从而忠实地提取知识,使教师网络能够更好地规范学生网络,并改进相应的性能。我们得出一个可解释的近似算法,并用明确的理论衍生方法解决计算Lipschitz常数的NP-硬性问题。实验结果显示,我们的方法超越了CIRA-100、图像网和PASAL VOCset数据组若干知识蒸馏任务(例如分类、分解和对象探测)的其他基准。

0

相关内容

Lipschitz

【AAAI2021】Lipschitz终身强化学习

专知会员服务

31+阅读 · 2020年12月14日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

图机器学习-图拉普拉斯算子的离散正则性，141页ppt，Discrete regularity graph Laplacians

专知会员服务

29+阅读 · 2020年6月4日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

专知会员服务

31+阅读 · 2019年11月25日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

人工智能 | NIPS 2019等国际会议信息8条

人工智能 | NIPS 2019等国际会议信息8条

Call4Papers

7+阅读 · 2019年3月21日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

专知

17+阅读 · 2018年4月19日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

Augmenting Knowledge Distillation With Peer-To-Peer Mutual Learning For Model Compression

Arxiv

0+阅读 · 2021年10月21日

Adaptive Distillation: Aggregating Knowledge from Multiple Paths for Efficient Distillation

Arxiv

0+阅读 · 2021年10月19日

Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher

Arxiv

0+阅读 · 2021年10月16日

Extract the Knowledge of Graph Neural Networks and Go Beyond it: An Effective Knowledge Distillation Framework

Arxiv

9+阅读 · 2021年3月4日

Progressive Network Grafting for Few-Shot Knowledge Distillation

Progressive Network Grafting for Few-Shot Knowledge Distillation

Arxiv

4+阅读 · 2020年12月9日

Kernel Based Progressive Distillation for Adder Neural Networks

Arxiv

5+阅读 · 2020年9月29日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

已删除

Arxiv

32+阅读 · 2020年3月23日

Uninformed Students: Student-Teacher Anomaly Detection with Discriminative Latent Embeddings

Uninformed Students: Student-Teacher Anomaly Detection with Discriminative Latent Embeddings

Arxiv

4+阅读 · 2019年11月6日

One-Shot Relational Learning for Knowledge Graphs

Arxiv

3+阅读 · 2018年8月27日

VIP会员

文章信息

相关主题

Lipschitz连续

相关VIP内容

【AAAI2021】Lipschitz终身强化学习

专知会员服务

31+阅读 · 2020年12月14日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

图机器学习-图拉普拉斯算子的离散正则性，141页ppt，Discrete regularity graph Laplacians

专知会员服务

29+阅读 · 2020年6月4日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

【Google】神经架构搜索（Neural Architecture Search and Beyond），Barret Zoph

专知会员服务

31+阅读 · 2019年11月25日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《人与智能体在系统工程建模语言V2任务中的性能表现：基于用户中心化的评估方法》308页

《数据安全国家标准体系（2025版）》征求意见稿

AlphaMosaic：人工智能赋能的作战管理系统

《军事行动中通信平台的战略价值：提升战术效能与作战优势》

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

人工智能 | NIPS 2019等国际会议信息8条

人工智能 | NIPS 2019等国际会议信息8条

Call4Papers

7+阅读 · 2019年3月21日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【NIPS2018】接收论文列表

【NIPS2018】接收论文列表

专知

5+阅读 · 2018年9月10日

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

专知

17+阅读 · 2018年4月19日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

相关论文

Augmenting Knowledge Distillation With Peer-To-Peer Mutual Learning For Model Compression

Arxiv

0+阅读 · 2021年10月21日

Adaptive Distillation: Aggregating Knowledge from Multiple Paths for Efficient Distillation

Arxiv

0+阅读 · 2021年10月19日

Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher

Arxiv

0+阅读 · 2021年10月16日

Extract the Knowledge of Graph Neural Networks and Go Beyond it: An Effective Knowledge Distillation Framework

Arxiv

9+阅读 · 2021年3月4日

Progressive Network Grafting for Few-Shot Knowledge Distillation

Progressive Network Grafting for Few-Shot Knowledge Distillation

Arxiv

4+阅读 · 2020年12月9日

Kernel Based Progressive Distillation for Adder Neural Networks

Arxiv

5+阅读 · 2020年9月29日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

已删除

Arxiv

32+阅读 · 2020年3月23日

Uninformed Students: Student-Teacher Anomaly Detection with Discriminative Latent Embeddings

Uninformed Students: Student-Teacher Anomaly Detection with Discriminative Latent Embeddings

Arxiv

4+阅读 · 2019年11月6日

One-Shot Relational Learning for Knowledge Graphs

Arxiv

3+阅读 · 2018年8月27日

微信扫码咨询专知VIP会员