训练前变异器的同位数任务 - 不可知蒸馏 (HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers) - 专知论文

会员服务 ·

0

蒸馏 · MoDELS · 知识 (knowledge) · 剪枝 · 变换 ·

2023 年 2 月 19 日

HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers

翻译：训练前变异器的同位数任务 - 不可知蒸馏

Chen Liang,Haoming Jiang,Zheng Li,Xianfeng Tang,Bin Yin,Tuo Zhao

Knowledge distillation has been shown to be a powerful model compression approach to facilitate the deployment of pre-trained language models in practice. This paper focuses on task-agnostic distillation. It produces a compact pre-trained model that can be easily fine-tuned on various tasks with small computational costs and memory footprints. Despite the practical benefits, task-agnostic distillation is challenging. Since the teacher model has a significantly larger capacity and stronger representation power than the student model, it is very difficult for the student to produce predictions that match the teacher's over a massive amount of open-domain training data. Such a large prediction discrepancy often diminishes the benefits of knowledge distillation. To address this challenge, we propose Homotopic Distillation (HomoDistil), a novel task-agnostic distillation approach equipped with iterative pruning. Specifically, we initialize the student model from the teacher model, and iteratively prune the student's neurons until the target width is reached. Such an approach maintains a small discrepancy between the teacher's and student's predictions throughout the distillation process, which ensures the effectiveness of knowledge transfer. Extensive experiments demonstrate that HomoDistil achieves significant improvements on existing baselines.

翻译：事实证明,知识蒸馏是一种强大的模型压缩方法,有利于在实践中部署经过培训的语文模型。本文侧重于任务-神学蒸馏。它产生一个经过培训的紧凑模型,可以很容易地以少量计算成本和记忆足迹对各种任务进行微调。尽管具有实际效益,但任务-神学蒸馏具有挑战性。由于教师模型比学生模型具有比学生模型大得多的能力和更强的代表能力,因此学生很难作出与教师在大量开放式培训数据上相匹配的预测。这种巨大的预测差异往往会减少知识蒸馏的效益。为了应对这一挑战,我们建议采用具有迭接性理功能的新型任务-神学蒸馏法。具体地说,我们从教师模型开始,并反复地将学生的神经元放在目标宽度达到之前。这样一种方法在教师和学生在整个蒸馏过程中的预测之间维持了很小的差异,从而降低了知识蒸馏的效益。为了应对这一挑战,我们建议采用带有迭接作用的新的任务-神学提炼法,我们建议一种带有迭接力的新型知识的实验。

0

相关内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

NeurIPS 2021教程|OpenAI-Lilian Weng等：自监督学习与对比学习，105页ppt，

NeurIPS 2021教程|OpenAI-Lilian Weng等：自监督学习与对比学习，105页ppt，

专知会员服务

78+阅读 · 2021年12月10日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

专知会员服务

59+阅读 · 2019年12月24日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

35+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

180+阅读 · 2019年10月11日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

纳米尺度效应下a-Si:H/c-Si基异质结的光伏性能的调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Rac1信号通路在糖尿病肾病足细胞转分化中的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

低成本环保型Cu2ZnSnS4/a-Si太阳电池材料与光伏性能的研究

国家自然科学基金

0+阅读 · 2011年12月31日

hTERT/Tet-on/GAL基因修饰BMSCs对慢性神经病理痛的可控性镇痛作用及其机制

国家自然科学基金

0+阅读 · 2011年12月31日

宽吸收有机小分子和大环染料敏化剂的设计合成与光伏性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

抑制肝癌细胞增殖的喹噁啉类化合物构效关系及其对细胞周期调控的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

ephrinB2/EphB4轴在CTLA4Ig修饰人骨髓MSCs异基因移植中的成骨调控机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

自成膜小分子空穴传输化合物的合成与性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

S1P介导骨髓间充质干细胞参与肝纤维化的机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

骨髓间充质干细胞对梗死心肌中肌纤维母细胞调控机制的研究

国家自然科学基金

0+阅读 · 2008年12月31日

Do we need Label Regularization to Fine-tune Pre-trained Language Models?

Do we need Label Regularization to Fine-tune Pre-trained Language Models?

Arxiv

0+阅读 · 2023年4月12日

Continual Pre-training of Language Models

Arxiv

0+阅读 · 2023年4月12日

Grouped Knowledge Distillation for Deep Face Recognition

Arxiv

0+阅读 · 2023年4月10日

On the Interpretability of Attention Networks

Arxiv

0+阅读 · 2023年4月9日

SSS at SemEval-2023 Task 10: Explainable Detection of Online Sexism using Majority Voted Fine-Tuned Transformers

Arxiv

0+阅读 · 2023年4月7日

A Survey of Knowledge-Enhanced Pre-trained Language Models

Arxiv

18+阅读 · 2022年11月17日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

VIP会员

文章信息

相关主题

知识 (knowledge)

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

NeurIPS 2021教程|OpenAI-Lilian Weng等：自监督学习与对比学习，105页ppt，

NeurIPS 2021教程|OpenAI-Lilian Weng等：自监督学习与对比学习，105页ppt，

专知会员服务

78+阅读 · 2021年12月10日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

专知会员服务

59+阅读 · 2019年12月24日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

35+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

180+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

现代化C4ISR体系：应对复杂战场形态的演进

美陆军2025年新版条令《作战》296页

《基于学习的下一代智能网络优化方法》180页

《协同式自主海上系统制导、导航与控制（GNC）架构》174页

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

Do we need Label Regularization to Fine-tune Pre-trained Language Models?

Do we need Label Regularization to Fine-tune Pre-trained Language Models?

Arxiv

0+阅读 · 2023年4月12日

Continual Pre-training of Language Models

Arxiv

0+阅读 · 2023年4月12日

Grouped Knowledge Distillation for Deep Face Recognition

Arxiv

0+阅读 · 2023年4月10日

On the Interpretability of Attention Networks

Arxiv

0+阅读 · 2023年4月9日

SSS at SemEval-2023 Task 10: Explainable Detection of Online Sexism using Majority Voted Fine-Tuned Transformers

Arxiv

0+阅读 · 2023年4月7日

A Survey of Knowledge-Enhanced Pre-trained Language Models

Arxiv

18+阅读 · 2022年11月17日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

相关基金

纳米尺度效应下a-Si:H/c-Si基异质结的光伏性能的调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Rac1信号通路在糖尿病肾病足细胞转分化中的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

低成本环保型Cu2ZnSnS4/a-Si太阳电池材料与光伏性能的研究

国家自然科学基金

0+阅读 · 2011年12月31日

hTERT/Tet-on/GAL基因修饰BMSCs对慢性神经病理痛的可控性镇痛作用及其机制

国家自然科学基金

0+阅读 · 2011年12月31日

宽吸收有机小分子和大环染料敏化剂的设计合成与光伏性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

抑制肝癌细胞增殖的喹噁啉类化合物构效关系及其对细胞周期调控的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

ephrinB2/EphB4轴在CTLA4Ig修饰人骨髓MSCs异基因移植中的成骨调控机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

自成膜小分子空穴传输化合物的合成与性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

S1P介导骨髓间充质干细胞参与肝纤维化的机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

骨髓间充质干细胞对梗死心肌中肌纤维母细胞调控机制的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员