【ACL2021】Weight Distillation：神经网络权重知识迁移方法 - 专知VIP

会员服务 ·

2

知识蒸馏 · 知识迁移 ·

2021 年 8 月 17 日

【ACL2021】Weight Distillation：神经网络权重知识迁移方法

专知会员服务

专知，提供专业可信的知识分发服务，让认知协作更快更好！

知识蒸馏作为一种有效的模型加速和模型压缩方法，近年来得到了广泛的使用。它通过使用大型神经网络的预测作为小型神经网络的学习目标，将知识从大型神经网络转移到小型神经网络。但是，这种方式忽略了大型神经网络内部的知识，例如权重。在本文中我们提出权重蒸馏，通过参数生成器将大型神经网络的权重知识转移到小型神经网络。在WMT16 En-Ro，NIST12 Zh-En和WMT14 En-De机器翻译任务上，小牛翻译团队的实验表明，权重蒸馏学习的小型网络比大型网络快1.88~2.94倍，而且具有很好的翻译性能。

成为VIP会员查看完整内容

21

相关内容

知识蒸馏

【ACL2021】基于隐含结构推理网络的事件因果关系识别

专知会员服务

52+阅读 · 2021年8月13日

【CVPR2021】神经网络中的知识演化

【CVPR2021】神经网络中的知识演化

专知会员服务

25+阅读 · 2021年3月11日

稀缺资源语言神经网络机器翻译研究综述

稀缺资源语言神经网络机器翻译研究综述

专知会员服务

27+阅读 · 2020年12月2日

【NeurIPS 2020】学习神经网络中的不变性

专知会员服务

29+阅读 · 2020年10月24日

【CVPR2020-浙江大学-阿里巴巴】深层知识迁移的深层归因图，DEPARA: Deep Attribution Graph for Deep Knowledge Transferability

【CVPR2020-浙江大学-阿里巴巴】深层知识迁移的深层归因图，DEPARA: Deep Attribution Graph for Deep Knowledge Transferability

专知会员服务

29+阅读 · 2020年4月17日

【ACL2020-伯克利】预训练Transformer提高分布外鲁棒性

【ACL2020-伯克利】预训练Transformer提高分布外鲁棒性

专知会员服务

20+阅读 · 2020年4月14日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

96+阅读 · 2020年3月25日

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

专知会员服务

56+阅读 · 2020年3月12日

【伯克利】通过增大模型加速Transformer训练和推理

专知会员服务

45+阅读 · 2020年3月6日

【AAAI2020论文】小样本网络压缩，Few Shot Network Compression via Cross Distillation (附pdf）

专知会员服务

26+阅读 · 2019年11月23日

稀缺资源语言神经网络机器翻译研究综述

稀缺资源语言神经网络机器翻译研究综述

专知

5+阅读 · 2020年12月3日

【NeurIPS 2020】核基渐进蒸馏加法器神经网络

【NeurIPS 2020】核基渐进蒸馏加法器神经网络

专知

13+阅读 · 2020年10月19日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知

41+阅读 · 2020年3月25日

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

专知

54+阅读 · 2020年3月12日

BERT 瘦身之路：Distillation，Quantization，Pruning

BERT 瘦身之路：Distillation，Quantization，Pruning

AINLP

10+阅读 · 2019年10月22日

利用神经网络进行序列到序列转换的学习

利用神经网络进行序列到序列转换的学习

AI研习社

12+阅读 · 2019年4月26日

已删除

将门创投

5+阅读 · 2019年4月4日

最新论文解读 | 基于预训练自然语言生成的文本摘要方法

最新论文解读 | 基于预训练自然语言生成的文本摘要方法

微软研究院AI头条

57+阅读 · 2019年3月19日

FAIR&MIT提出知识蒸馏新方法：数据集蒸馏

FAIR&MIT提出知识蒸馏新方法：数据集蒸馏

机器之心

7+阅读 · 2019年2月7日

共享相关任务表征，一文读懂深度神经网络多任务学习

共享相关任务表征，一文读懂深度神经网络多任务学习

机器之心

7+阅读 · 2017年6月23日

QuatDE: Dynamic Quaternion Embedding for Knowledge Graph Completion

Arxiv

3+阅读 · 2021年5月19日

Momentum Residual Neural Networks

Arxiv

7+阅读 · 2021年5月13日

General Instance Distillation for Object Detection

Arxiv

9+阅读 · 2021年3月3日

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

Arxiv

6+阅读 · 2020年12月14日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss

Arxiv

3+阅读 · 2020年2月2日

Inducing Relational Knowledge from BERT

Arxiv

3+阅读 · 2019年11月28日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Arxiv

21+阅读 · 2019年3月27日

A General and Adaptive Robust Loss Function

A General and Adaptive Robust Loss Function

Arxiv

8+阅读 · 2018年11月5日

VIP会员

相关主题

相关VIP内容

【ACL2021】基于隐含结构推理网络的事件因果关系识别

专知会员服务

52+阅读 · 2021年8月13日

【CVPR2021】神经网络中的知识演化

【CVPR2021】神经网络中的知识演化

专知会员服务

25+阅读 · 2021年3月11日

稀缺资源语言神经网络机器翻译研究综述

稀缺资源语言神经网络机器翻译研究综述

专知会员服务

27+阅读 · 2020年12月2日

【NeurIPS 2020】学习神经网络中的不变性

专知会员服务

29+阅读 · 2020年10月24日

【CVPR2020-浙江大学-阿里巴巴】深层知识迁移的深层归因图，DEPARA: Deep Attribution Graph for Deep Knowledge Transferability

【CVPR2020-浙江大学-阿里巴巴】深层知识迁移的深层归因图，DEPARA: Deep Attribution Graph for Deep Knowledge Transferability

专知会员服务

29+阅读 · 2020年4月17日

【ACL2020-伯克利】预训练Transformer提高分布外鲁棒性

【ACL2020-伯克利】预训练Transformer提高分布外鲁棒性

专知会员服务

20+阅读 · 2020年4月14日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

96+阅读 · 2020年3月25日

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

专知会员服务

56+阅读 · 2020年3月12日

【伯克利】通过增大模型加速Transformer训练和推理

专知会员服务

45+阅读 · 2020年3月6日

【AAAI2020论文】小样本网络压缩，Few Shot Network Compression via Cross Distillation (附pdf）

专知会员服务

26+阅读 · 2019年11月23日

热门VIP内容

开通专知VIP会员享更多权益服务

GPT-5如何对齐？从硬性拒绝到安全完成：走向以输出为中心的安全训练

【伯克利博士论文】超越人类监督的视觉智能

【ICCV2025】SO(3) 上连续非保守动力系统的预测

2025年中国数据要素行业发展研究报告

相关资讯

稀缺资源语言神经网络机器翻译研究综述

稀缺资源语言神经网络机器翻译研究综述

专知

5+阅读 · 2020年12月3日

【NeurIPS 2020】核基渐进蒸馏加法器神经网络

【NeurIPS 2020】核基渐进蒸馏加法器神经网络

专知

13+阅读 · 2020年10月19日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知

41+阅读 · 2020年3月25日

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

专知

54+阅读 · 2020年3月12日

BERT 瘦身之路：Distillation，Quantization，Pruning

BERT 瘦身之路：Distillation，Quantization，Pruning

AINLP

10+阅读 · 2019年10月22日

利用神经网络进行序列到序列转换的学习

利用神经网络进行序列到序列转换的学习

AI研习社

12+阅读 · 2019年4月26日

已删除

将门创投

5+阅读 · 2019年4月4日

最新论文解读 | 基于预训练自然语言生成的文本摘要方法

最新论文解读 | 基于预训练自然语言生成的文本摘要方法

微软研究院AI头条

57+阅读 · 2019年3月19日

FAIR&MIT提出知识蒸馏新方法：数据集蒸馏

FAIR&MIT提出知识蒸馏新方法：数据集蒸馏

机器之心

7+阅读 · 2019年2月7日

共享相关任务表征，一文读懂深度神经网络多任务学习

共享相关任务表征，一文读懂深度神经网络多任务学习

机器之心

7+阅读 · 2017年6月23日

相关论文

QuatDE: Dynamic Quaternion Embedding for Knowledge Graph Completion

Arxiv

3+阅读 · 2021年5月19日

Momentum Residual Neural Networks

Arxiv

7+阅读 · 2021年5月13日

General Instance Distillation for Object Detection

Arxiv

9+阅读 · 2021年3月3日

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

Arxiv

6+阅读 · 2020年12月14日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss

Arxiv

3+阅读 · 2020年2月2日

Inducing Relational Knowledge from BERT

Arxiv

3+阅读 · 2019年11月28日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Rethinking Knowledge Graph Propagation for Zero-Shot Learning

Arxiv

21+阅读 · 2019年3月27日

A General and Adaptive Robust Loss Function

A General and Adaptive Robust Loss Function

Arxiv

8+阅读 · 2018年11月5日

微信扫码咨询专知VIP会员