经向预训练代码模型的有效微调：一项实验研究及其拓展 (Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond) - 专知论文

会员服务 ·

0

微调 · 预训练 · 代码 · 语义属性 · CodeBERT ·

2023 年 4 月 11 日

Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond

翻译：经向预训练代码模型的有效微调：一项实验研究及其拓展

Ensheng Shi,Yanlin Wang,Hongyu Zhang,Lun Du,Shi Han,Dongmei Zhang,Hongbin Sun

from arxiv, Accepted by ISSTA 2023 (The 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis)

Recently, fine-tuning pre-trained code models such as CodeBERT on downstream tasks has achieved great success in many software testing and analysis tasks. While effective and prevalent, fine-tuning the pre-trained parameters incurs a large computational cost. In this paper, we conduct an extensive experimental study to explore what happens to layer-wise pre-trained representations and their encoded code knowledge during fine-tuning. We then propose efficient alternatives to fine-tune the large pre-trained code model based on the above findings. Our experimental study shows that (1) lexical, syntactic and structural properties of source code are encoded in the lower, intermediate, and higher layers, respectively, while the semantic property spans across the entire model. (2) The process of fine-tuning preserves most of the code properties. Specifically, the basic code properties captured by lower and intermediate layers are still preserved during fine-tuning. Furthermore, we find that only the representations of the top two layers change most during fine-tuning for various downstream tasks. (3) Based on the above findings, we propose Telly to efficiently fine-tune pre-trained code models via layer freezing. The extensive experimental results on five various downstream tasks demonstrate that training parameters and the corresponding time cost are greatly reduced, while performances are similar or better. Replication package including source code, datasets, and online Appendix is available at: \url{https://github.com/DeepSoftwareAnalytics/Telly}.

翻译：近年来，如 CodeBERT 等预训练代码模型的微调在软件测试和分析任务中取得了巨大成功。然而，微调预训练参数会带来巨大的计算代价。本文通过大规模实验研究，探讨了微调过程中不同层的预训练表征及其编码的代码知识的变化。在此基础上，提出了一种在保持模型性能的前提下有效微调预训练代码模型的方法。实验结果表明，源代码的词法、句法和结构属性主要编码在较低、中层，语义属性则涵盖整个模型。微调过程主要改变模型顶部层的表征。根据这些发现，我们提出了 Telly 这一通过层冻结实现高效微调预训练代码模型的方法。在五个不同应用场景的持续任务中，Telly 实现了训练参数和时间成本的显著降低，同时 performance 取得了与原有方法相当甚至更好的结果。源代码、数据集和在线附录均已开放共享。（翻译为自然语言可读性所作，如有不妥之处请包容指正。）

0

相关内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

【清华大学】Delta调优:预训练语言模型参数有效方法的综合研究，Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models

【清华大学】Delta调优:预训练语言模型参数有效方法的综合研究，Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models

专知会员服务

26+阅读 · 2022年3月15日

【KDD2021】TUTA: 通用表格预训练的树结构Transformer

专知会员服务

25+阅读 · 2021年8月22日

预训练语言模型fine-tuning近期进展概述

预训练语言模型fine-tuning近期进展概述

专知会员服务

40+阅读 · 2021年4月9日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

专知会员服务

26+阅读 · 2020年3月19日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【KDD2019教程】从浅层到深层的语言表达:预训练、微调，等等，From Shallow to Deep Language Representations: Pre-training, Fine-tuning, and Beyond

【KDD2019教程】从浅层到深层的语言表达:预训练、微调，等等，From Shallow to Deep Language Representations: Pre-training, Fine-tuning, and Beyond

专知会员服务

16+阅读 · 2019年11月4日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

水管理策略对流域水循环和农业生产力的影响研究

国家自然科学基金

0+阅读 · 2014年12月31日

ADS强外中子源效应的时空动力学研究与模拟软件开发

国家自然科学基金

0+阅读 · 2013年12月31日

长记忆波动率模型的结构性质、统计推断及应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

利用多光子干涉对SU(N)矩阵进行矩阵计算的实验研究

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA-uc002mbe.2介导的HDACi凋亡效应及其在肝癌中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

区域干旱过程模拟和农业干旱风险评价及应对研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于信道Time/Power度量指标的TOA测距误差模型及其应用研究

国家自然科学基金

0+阅读 · 2011年12月31日

声纹识别中合成语音的鲁棒性研究

国家自然科学基金

1+阅读 · 2009年12月31日

基于改进的支持向量机在语音识别中的应用研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于双语文档反馈的跨语言信息检索研究

国家自然科学基金

0+阅读 · 2008年12月31日

Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation

Arxiv

0+阅读 · 2023年5月28日

Parameter-Efficient Fine-Tuning without Introducing New Latency

Arxiv

0+阅读 · 2023年5月26日

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Arxiv

0+阅读 · 2023年5月25日

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

Arxiv

0+阅读 · 2023年5月25日

Few-shot Event Detection: An Empirical Study and a Unified View

Arxiv

0+阅读 · 2023年5月25日

Towards Parameter-Efficient Integration of Pre-Trained Language Models In Temporal Video Grounding

Arxiv

0+阅读 · 2023年5月25日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

Data Augmentation using Pre-trained Transformer Models

Arxiv

17+阅读 · 2020年3月4日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

VIP会员

文章信息

相关主题

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

【清华大学】Delta调优:预训练语言模型参数有效方法的综合研究，Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models

【清华大学】Delta调优:预训练语言模型参数有效方法的综合研究，Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models

专知会员服务

26+阅读 · 2022年3月15日

【KDD2021】TUTA: 通用表格预训练的树结构Transformer

专知会员服务

25+阅读 · 2021年8月22日

预训练语言模型fine-tuning近期进展概述

预训练语言模型fine-tuning近期进展概述

专知会员服务

40+阅读 · 2021年4月9日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

专知会员服务

26+阅读 · 2020年3月19日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【微软亚洲研究院】CodeBERT:用于编程和自然语言的预训练模型，CodeBERT: A Pre-Trained Model for Programming and Natural Languages

专知会员服务

32+阅读 · 2020年2月21日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【KDD2019教程】从浅层到深层的语言表达:预训练、微调，等等，From Shallow to Deep Language Representations: Pre-training, Fine-tuning, and Beyond

【KDD2019教程】从浅层到深层的语言表达:预训练、微调，等等，From Shallow to Deep Language Representations: Pre-training, Fine-tuning, and Beyond

专知会员服务

16+阅读 · 2019年11月4日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

相关论文

Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation

Arxiv

0+阅读 · 2023年5月28日

Parameter-Efficient Fine-Tuning without Introducing New Latency

Arxiv

0+阅读 · 2023年5月26日

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Arxiv

0+阅读 · 2023年5月25日

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

Arxiv

0+阅读 · 2023年5月25日

Few-shot Event Detection: An Empirical Study and a Unified View

Arxiv

0+阅读 · 2023年5月25日

Towards Parameter-Efficient Integration of Pre-Trained Language Models In Temporal Video Grounding

Arxiv

0+阅读 · 2023年5月25日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Arxiv

21+阅读 · 2020年12月17日

Data Augmentation using Pre-trained Transformer Models

Arxiv

17+阅读 · 2020年3月4日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

相关基金

水管理策略对流域水循环和农业生产力的影响研究

国家自然科学基金

0+阅读 · 2014年12月31日

ADS强外中子源效应的时空动力学研究与模拟软件开发

国家自然科学基金

0+阅读 · 2013年12月31日

长记忆波动率模型的结构性质、统计推断及应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

利用多光子干涉对SU(N)矩阵进行矩阵计算的实验研究

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA-uc002mbe.2介导的HDACi凋亡效应及其在肝癌中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

区域干旱过程模拟和农业干旱风险评价及应对研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于信道Time/Power度量指标的TOA测距误差模型及其应用研究

国家自然科学基金

0+阅读 · 2011年12月31日

声纹识别中合成语音的鲁棒性研究

国家自然科学基金

1+阅读 · 2009年12月31日

基于改进的支持向量机在语音识别中的应用研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于双语文档反馈的跨语言信息检索研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员