CAT-CAT-检验:以基于计量的方法来解释编程语言人员守则结构的预培训模式 (CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models for Programming Language Attend Code Structure) - 专知论文

会员服务 ·

0

代码 · MoDELS · Attention · Extensibility · 得分 ·

2022 年 10 月 22 日

CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models for Programming Language Attend Code Structure

翻译：CAT-CAT-检验:以基于计量的方法来解释编程语言人员守则结构的预培训模式

Nuo Chen,Qiushi Sun,Renyu Zhu,Xiang Li,Xuesong Lu,Ming Gao

from arxiv, Accepted by EMNLP 2022

Code pre-trained models (CodePTMs) have recently demonstrated significant success in code intelligence. To interpret these models, some probing methods have been applied. However, these methods fail to consider the inherent characteristics of codes. In this paper, to address the problem, we propose a novel probing method CAT-probing to quantitatively interpret how CodePTMs attend code structure. We first denoise the input code sequences based on the token types pre-defined by the compilers to filter those tokens whose attention scores are too small. After that, we define a new metric CAT-score to measure the commonality between the token-level attention scores generated in CodePTMs and the pair-wise distances between corresponding AST nodes. The higher the CAT-score, the stronger the ability of CodePTMs to capture code structure. We conduct extensive experiments to integrate CAT-probing with representative CodePTMs for different programming languages. Experimental results show the effectiveness of CAT-probing in CodePTM interpretation. Our codes and data are publicly available at https://github.com/nchen909/CodeAttention.

翻译：最近,经过事先培训的代码模型(CodePTMs)在代码情报方面取得了显著的成功。为了解释这些模型,已经采用了一些测试方法。但是,这些方法没有考虑到代码的内在特征。在本文件中,为了解决这个问题,我们建议一种新型的检测方法,用定量的方法来解释代码代码模型是如何使用代码结构的。我们首先根据编译者预先界定的象征类型,淡化输入代码序列,以过滤那些关注分数太小的符号。随后,我们定义了一种新的指标CAT核心,以测量代码和代码模型生成的象征性关注分数和相应的AST节点之间的对对称距离之间的共性。CAT核心越高,代码模型捕捉代码结构的能力就越强。我们进行了广泛的实验,以将CAT-Probing与具有代表性的不同编程语言的代码代码PTMs结合起来。实验结果显示CAT-Probing在代码PTM解释中的有效性。我们的代码和数据公布在https://github.com/nchen909/CodeAstening上。

0

相关内容

代码（Code）是专知网的一个重要知识资料文档板块，旨在整理收录论文源代码、复现代码，经典工程代码等，便于用户查阅下载使用。

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

基于金磁微粒的即时检测用免疫探针的构建及微结构调控

国家自然科学基金

0+阅读 · 2015年12月31日

基于磷酸化蛋白质组学调控网络的片仔癀逆转大肠癌耐药机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

新生儿缺血缺氧性脑病中TRPC3通道经SOC介导的钙内流调控内质网应激的研究

国家自然科学基金

0+阅读 · 2013年12月31日

IGF-1R糖基化修饰通过调控内质网应激影响卵巢癌细胞生物学行为的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于突发OFDM系统的时域符号同步关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

Cycloartane型三萜抗肝损伤构效关系和作用机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

WIF-1基因甲基化修饰对动脉粥样硬化的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

海洋吡咯生物碱的设计、合成与活性研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于TH-UWB的室内无线语音通信SoC研究与设计

国家自然科学基金

0+阅读 · 2011年12月31日

异步低功耗LDPC解码器设计

国家自然科学基金

0+阅读 · 2009年12月31日

A Frequency-Structure Approach for Link Stream Analysis

Arxiv

0+阅读 · 2022年12月7日

An automated approach to extracting positive and negative clinical research results

Arxiv

0+阅读 · 2022年12月7日

ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed

Arxiv

0+阅读 · 2022年12月7日

Budge: a programming language and a theorem prover

Arxiv

0+阅读 · 2022年12月6日

CodeAttack: Code-based Adversarial Attacks for Pre-Trained Programming Language Models

Arxiv

0+阅读 · 2022年12月6日

Comparative layer-wise analysis of self-supervised speech models

Arxiv

0+阅读 · 2022年12月3日

Link Prediction on N-ary Relational Facts: A Graph-based Approach

Arxiv

13+阅读 · 2021年5月18日

Pre-trained Models for Natural Language Processing: A Survey

Arxiv

113+阅读 · 2020年3月18日

Attributed Graph Clustering via Adaptive Graph Convolution

Arxiv

11+阅读 · 2019年6月4日

Differentiable Dynamic Programming for Structured Prediction and Attention

Arxiv

56+阅读 · 2018年2月20日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】面向企业的图学习扩展：生产级图学习与推理，485页pdf

AI智能体编程：技术、挑战与机遇综述

【国家标准】数据安全技术数据安全风险评估方法

【CMU博士论文】交互式学习的进展：替代性反馈机制与自适应因果推理

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

A Frequency-Structure Approach for Link Stream Analysis

Arxiv

0+阅读 · 2022年12月7日

An automated approach to extracting positive and negative clinical research results

Arxiv

0+阅读 · 2022年12月7日

ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed

Arxiv

0+阅读 · 2022年12月7日

Budge: a programming language and a theorem prover

Arxiv

0+阅读 · 2022年12月6日

CodeAttack: Code-based Adversarial Attacks for Pre-Trained Programming Language Models

Arxiv

0+阅读 · 2022年12月6日

Comparative layer-wise analysis of self-supervised speech models

Arxiv

0+阅读 · 2022年12月3日

Link Prediction on N-ary Relational Facts: A Graph-based Approach

Arxiv

13+阅读 · 2021年5月18日

Pre-trained Models for Natural Language Processing: A Survey

Arxiv

113+阅读 · 2020年3月18日

Attributed Graph Clustering via Adaptive Graph Convolution

Arxiv

11+阅读 · 2019年6月4日

Differentiable Dynamic Programming for Structured Prediction and Attention

Arxiv

56+阅读 · 2018年2月20日

相关基金

基于金磁微粒的即时检测用免疫探针的构建及微结构调控

国家自然科学基金

0+阅读 · 2015年12月31日

基于磷酸化蛋白质组学调控网络的片仔癀逆转大肠癌耐药机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

新生儿缺血缺氧性脑病中TRPC3通道经SOC介导的钙内流调控内质网应激的研究

国家自然科学基金

0+阅读 · 2013年12月31日

IGF-1R糖基化修饰通过调控内质网应激影响卵巢癌细胞生物学行为的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于突发OFDM系统的时域符号同步关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

Cycloartane型三萜抗肝损伤构效关系和作用机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

WIF-1基因甲基化修饰对动脉粥样硬化的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

海洋吡咯生物碱的设计、合成与活性研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于TH-UWB的室内无线语音通信SoC研究与设计

国家自然科学基金

0+阅读 · 2011年12月31日

异步低功耗LDPC解码器设计

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员