CAT-CAT-检验:以基于计量的方法来解释编程语言人员守则结构的预培训模式 (CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models for Programming Language Attend Code Structure) - 专知论文

会员服务 ·

0

代码 · MoDELS · Attention · 得分 · Extensibility ·

2022 年 12 月 10 日

CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models for Programming Language Attend Code Structure

翻译：CAT-CAT-检验:以基于计量的方法来解释编程语言人员守则结构的预培训模式

Nuo Chen,Qiushi Sun,Renyu Zhu,Xiang Li,Xuesong Lu,Ming Gao

from arxiv, Accepted by EMNLP 2022

Code pre-trained models (CodePTMs) have recently demonstrated significant success in code intelligence. To interpret these models, some probing methods have been applied. However, these methods fail to consider the inherent characteristics of codes. In this paper, to address the problem, we propose a novel probing method CAT-probing to quantitatively interpret how CodePTMs attend code structure. We first denoise the input code sequences based on the token types pre-defined by the compilers to filter those tokens whose attention scores are too small. After that, we define a new metric CAT-score to measure the commonality between the token-level attention scores generated in CodePTMs and the pair-wise distances between corresponding AST nodes. The higher the CAT-score, the stronger the ability of CodePTMs to capture code structure. We conduct extensive experiments to integrate CAT-probing with representative CodePTMs for different programming languages. Experimental results show the effectiveness of CAT-probing in CodePTM interpretation. Our codes and data are publicly available at https://github.com/nchen909/CodeAttention.

翻译：最近,经过事先培训的代码模型(CodePTMs)在代码情报方面取得了显著的成功。为了解释这些模型,已经采用了一些测试方法。但是,这些方法没有考虑到代码的内在特征。在本文件中,为了解决这个问题,我们建议一种新型的检测方法,用定量的方法来解释代码代码模型是如何使用代码结构的。我们首先根据编译者预先界定的象征类型,淡化输入代码序列,以过滤那些关注分数太小的符号。随后,我们定义了一种新的指标CAT核心,以测量代码和代码模型生成的象征性关注分数和相应的AST节点之间的对对称距离之间的共性。CAT核心越高,代码模型捕捉代码结构的能力就越强。我们进行了广泛的实验,以将CAT-Probing与具有代表性的不同编程语言的代码代码PTMs结合起来。实验结果显示CAT-Probing在代码PTM解释中的有效性。我们的代码和数据公布在https://github.com/nchen909/CodeAstening上。

0

相关内容

代码（Code）是专知网的一个重要知识资料文档板块，旨在整理收录论文源代码、复现代码，经典工程代码等，便于用户查阅下载使用。

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

面向X-CT应用的(Ce, Lu)3(Cr, Al)5O12闪烁陶瓷中过渡金属离子的光谱展宽效应研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于新型电子受体的共轭高分子合成及其在双极性有机场效应晶体管中的应用

国家自然科学基金

0+阅读 · 2014年12月31日

基于次级代谢产物活性和结构的重楼内生菌多样性及与宿主植物相关性研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

基于苦参碱结构的新骨架衍生物的合成、抗肿瘤构效关系和作用机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

26S蛋白酶体调节亚基组成蛋白Rpn5-Rpn9复合物的晶体结构

国家自然科学基金

0+阅读 · 2012年12月31日

鳜鱼驯食性状相关基因功能分析与机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

整合素受体介导Re-188标记的新型多肽分子探针用于肿瘤显像与治疗实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于纳米柱微结构的InGaN太阳能电池研究

国家自然科学基金

0+阅读 · 2009年12月31日

硫限制下植物体硒对镉的解毒机理及SRXRF表征

国家自然科学基金

0+阅读 · 2008年12月31日

SkCoder: A Sketch-based Approach for Automatic Code Generation

Arxiv

0+阅读 · 2023年2月13日

Summarize and Generate to Back-translate: Unsupervised Translation of Programming Languages

Arxiv

0+阅读 · 2023年2月11日

USCORE: An Effective Approach to Fully Unsupervised Evaluation Metrics for Machine Translation

Arxiv

0+阅读 · 2023年2月11日

CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code

Arxiv

0+阅读 · 2023年2月10日

Reliable Natural Language Understanding with Large Language Models and Answer Set Programming

Arxiv

0+阅读 · 2023年2月9日

A Latent-Variable Model for Intrinsic Probing

Arxiv

0+阅读 · 2023年2月9日

Enhancing E-Commerce Recommendation using Pre-Trained Language Model and Fine-Tuning

Arxiv

0+阅读 · 2023年2月9日

Intelligent Proactive Fault Tolerance at the Edge through Resource Usage Prediction

Arxiv

0+阅读 · 2023年2月9日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

Arxiv

16+阅读 · 2018年1月31日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《精确反蜂群防御系统：三维运动探测与定向空爆拦截技术融合》最新24页

地下战：地下空间的战略博弈

《无人机战争时代的战时法：大国竞争中的区分原则、相称性原则与行动建议》最新75页

《构建强健军事力量的设计挑战：提升海军兵力支持系统效能的多分辨率建模方法》69页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

相关论文

SkCoder: A Sketch-based Approach for Automatic Code Generation

Arxiv

0+阅读 · 2023年2月13日

Summarize and Generate to Back-translate: Unsupervised Translation of Programming Languages

Arxiv

0+阅读 · 2023年2月11日

USCORE: An Effective Approach to Fully Unsupervised Evaluation Metrics for Machine Translation

Arxiv

0+阅读 · 2023年2月11日

CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code

Arxiv

0+阅读 · 2023年2月10日

Reliable Natural Language Understanding with Large Language Models and Answer Set Programming

Arxiv

0+阅读 · 2023年2月9日

A Latent-Variable Model for Intrinsic Probing

Arxiv

0+阅读 · 2023年2月9日

Enhancing E-Commerce Recommendation using Pre-Trained Language Model and Fine-Tuning

Arxiv

0+阅读 · 2023年2月9日

Intelligent Proactive Fault Tolerance at the Edge through Resource Usage Prediction

Arxiv

0+阅读 · 2023年2月9日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

Arxiv

16+阅读 · 2018年1月31日

相关基金

面向X-CT应用的(Ce, Lu)3(Cr, Al)5O12闪烁陶瓷中过渡金属离子的光谱展宽效应研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于新型电子受体的共轭高分子合成及其在双极性有机场效应晶体管中的应用

国家自然科学基金

0+阅读 · 2014年12月31日

基于次级代谢产物活性和结构的重楼内生菌多样性及与宿主植物相关性研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

基于苦参碱结构的新骨架衍生物的合成、抗肿瘤构效关系和作用机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

26S蛋白酶体调节亚基组成蛋白Rpn5-Rpn9复合物的晶体结构

国家自然科学基金

0+阅读 · 2012年12月31日

鳜鱼驯食性状相关基因功能分析与机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

整合素受体介导Re-188标记的新型多肽分子探针用于肿瘤显像与治疗实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于纳米柱微结构的InGaN太阳能电池研究

国家自然科学基金

0+阅读 · 2009年12月31日

硫限制下植物体硒对镉的解毒机理及SRXRF表征

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员