PANDA: 迅速转让满足知识蒸馏,促进有效的模式适应 (PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation) - 专知论文

会员服务 ·

0

Prompt · 知识 (knowledge) · Pandas · Performer · 蒸馏 ·

2022 年 8 月 22 日

PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation

翻译：PANDA: 迅速转让满足知识蒸馏,促进有效的模式适应

Qihuang Zhong,Liang Ding,Juhua Liu,Bo Du,Dacheng Tao

Prompt-tuning, which freezes pretrained language models (PLMs) and only fine-tunes few parameters of additional soft prompt, shows competitive performance against full-parameter fine-tuning (i.e.model-tuning) when the PLM has billions of parameters, but still performs poorly in the case of smaller PLMs. Hence, prompt transfer (PoT), which initializes the target prompt with the trained prompt of similar source tasks, is recently proposed to improve over prompt-tuning. However, such a vanilla PoT approach usually achieves sub-optimal performance, as (i) the PoT is sensitive to the similarity of source-target pair and (ii) directly fine-tuning the prompt initialized with source prompt on target task might lead to catastrophic forgetting of source knowledge. In response to these problems, we propose a new metric to accurately predict the prompt transferability (regarding (i)), and a novel PoT approach (namely PANDA) that leverages the knowledge distillation technique to transfer the "knowledge" from the source prompt to the target prompt in a subtle manner and alleviate the catastrophic forgetting effectively (regarding (ii)). Furthermore, to achieve adaptive prompt transfer for each source-target pair, we use our metric to control the knowledge transfer in our PANDA approach. Extensive and systematic experiments on 189 combinations of 21 source and 9 target datasets across 5 scales of PLMs demonstrate that: 1) our proposed metric works well to predict the prompt transferability; 2) our PANDA consistently outperforms the vanilla PoT approach by 2.3% average score (up to 24.1%) among all tasks and model sizes; 3) with our PANDA approach, prompt-tuning can achieve competitive and even better performance than model-tuning in various PLM scales scenarios. Code and models will be released upon acceptance.

翻译：快速调试(POLM)冻结了预先培训的语言模型(PLM),只是微调了少数额外软性参数,在PLM有数十亿参数,但对于较小的PLM来说仍然表现不佳时,快速调试(POT)却冻结了预先培训的语言模型(PLM),只是微调了微调,只是微调了微调了少的附加软性参数,在PLM有数十亿参数时,PLM微调(即模调)显示在全参数微调微调(即模调)下,有竞争力的调(POT),在经过培训的类似源的快速调试(即PANDA)之后,这种调试调(即PANDA),这种调试法通常会达到次最佳的分数,因为(一)POT对源的调试率和(二)的快速调试(A),在每次调试的PLA标准中,可以显示我们的PLA标准组合。

0

相关内容

Prompt

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

一类微分半变分不等式问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

SUMO特异性蛋白酶SENP1介导的Sp1去SUMO化修饰异常在Nano-Co诱导细胞恶性转化中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

miR-17-92基因簇在鸡脂肪生成中的功能及其调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

miRNAs与DNA甲基转移酶1相互作用在同型半胱氨酸致血管平滑肌细胞增殖的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

蛋白质精氨酸甲基转移酶PRMT7调控肝脏发育及其机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

miR-146a和miR-10a/b在斑马鱼胚胎血管发育中的功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

miR-140在肿瘤转移中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

果蝇Hsp83参与调控组蛋白甲基化修饰的机制研究

国家自然科学基金

1+阅读 · 2011年12月31日

矩阵分解的低延迟并行算法

国家自然科学基金

0+阅读 · 2009年12月31日

Improving the Sample Efficiency of Prompt Tuning with Domain Adaptation

Arxiv

0+阅读 · 2022年10月6日

Data Efficient 3D Learner via Knowledge Transferred from 2D Model

Arxiv

0+阅读 · 2022年10月6日

Less is More: Task-aware Layer-wise Distillation for Language Model Compression

Arxiv

0+阅读 · 2022年10月5日

Deep Generative Modeling on Limited Data with Regularization by Nontransferable Pre-trained Models

Arxiv

0+阅读 · 2022年9月30日

Multi-Prompt Alignment for Multi-source Unsupervised Domain Adaptation

Arxiv

1+阅读 · 2022年9月30日

What Makes Pre-trained Language Models Better Zero/Few-shot Learners?

Arxiv

0+阅读 · 2022年9月30日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

28+阅读 · 2021年6月16日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

VIP会员

文章信息

相关主题

知识 (knowledge)

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】用于提升含优化层学习的算法与体系结构

【NeurIPS2025】有何不同于过去？基于自监督偏差学习的时空时间序列预测

超越决策优势：情报在创新与适应中的作用

量子计算发展态势研究报告（2025年）

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Improving the Sample Efficiency of Prompt Tuning with Domain Adaptation

Arxiv

0+阅读 · 2022年10月6日

Data Efficient 3D Learner via Knowledge Transferred from 2D Model

Arxiv

0+阅读 · 2022年10月6日

Less is More: Task-aware Layer-wise Distillation for Language Model Compression

Arxiv

0+阅读 · 2022年10月5日

Deep Generative Modeling on Limited Data with Regularization by Nontransferable Pre-trained Models

Arxiv

0+阅读 · 2022年9月30日

Multi-Prompt Alignment for Multi-source Unsupervised Domain Adaptation

Arxiv

1+阅读 · 2022年9月30日

What Makes Pre-trained Language Models Better Zero/Few-shot Learners?

Arxiv

0+阅读 · 2022年9月30日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

28+阅读 · 2021年6月16日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

相关基金

一类微分半变分不等式问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

SUMO特异性蛋白酶SENP1介导的Sp1去SUMO化修饰异常在Nano-Co诱导细胞恶性转化中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

miR-17-92基因簇在鸡脂肪生成中的功能及其调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

miRNAs与DNA甲基转移酶1相互作用在同型半胱氨酸致血管平滑肌细胞增殖的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

蛋白质精氨酸甲基转移酶PRMT7调控肝脏发育及其机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

miR-146a和miR-10a/b在斑马鱼胚胎血管发育中的功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

miR-140在肿瘤转移中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

果蝇Hsp83参与调控组蛋白甲基化修饰的机制研究

国家自然科学基金

1+阅读 · 2011年12月31日

矩阵分解的低延迟并行算法

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员