多种源代码表示对软件工程任务的影响——一个实证研究 (On the Impact of Multiple Source Code Representations on Software Engineering Tasks -- An Empirical Study) - 专知论文

会员服务 ·

0

表示 · 克隆检测 · 实证研究 · 代码 · 软件工程 ·

2023 年 4 月 1 日

On the Impact of Multiple Source Code Representations on Software Engineering Tasks -- An Empirical Study

翻译：多种源代码表示对软件工程任务的影响——一个实证研究

Karthik Chandra Swarna,Noble Saji Mathews,Dheeraj Vagavolu,Sridhar Chimalakonda

Efficiently representing source code is crucial for various software engineering tasks such as code classification and clone detection. Existing approaches primarily use Abstract Syntax Tree (AST), and only a few focus on semantic graphs such as Control Flow Graph (CFG) and Program Dependency Graph (PDG), which contain information about source code that AST does not. Even though some works tried to utilize multiple representations, they do not provide any insights about the costs and benefits of using multiple representations. The primary goal of this paper is to discuss the implications of utilizing multiple code representations, specifically AST, CFG, and PDG. We modify an AST path-based approach to accept multiple representations as input to an attention-based model. We do this to measure the impact of additional representations (such as CFG and PDG) over AST. We evaluate our approach on three tasks: Method Naming, Program Classification, and Clone Detection. Our approach increases the performance on these tasks by 11% (F1), 15.7% (Accuracy), and 9.3% (F1), respectively, over the baseline. In addition to the effect on performance, we discuss timing overheads incurred with multiple representations. We envision this work providing researchers with a lens to evaluate combinations of code representations for various tasks.

翻译：有效地表示源代码对于各种软件工程任务如代码分类和克隆检测至关重要。现有方法主要使用抽象语法树 (AST)，只有少数方法关注语义图，如控制流图（CFG）和程序依赖图（PDG），它们包含了AST不具备的源代码信息。尽管一些研究试图利用多种表示，但它们没有提供使用多种表示的成本和收益方面的见解。本文的主要目标是讨论利用多种代码表示（特别是AST、CFG和PDG）的影响。我们修改了一种基于AST路径的方法，将多个表示作为输入传递给基于注意力的模型。我们这样做是为了衡量额外表示（如CFG和PDG）对AST的影响。我们在三个任务上评估了我们的方法：方法命名、程序分类和克隆检测。相比于基线，我们的方法分别提高了这些任务的性能11%（F1）、15.7%（准确率）和9.3%（F1）。除了性能影响外，我们还讨论了多种表示所带来的时间开销。我们设想这项工作提供了一个评估代码表示组合以针对不同任务的视角供研究人员使用的框架。

0

相关内容

Chem Sci｜用于药物-药物相互作用预测的子结构感知图神经网络

Chem Sci｜用于药物-药物相互作用预测的子结构感知图神经网络

专知会员服务

14+阅读 · 2022年12月19日

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

专知会员服务

22+阅读 · 2022年2月19日

神经网络的持续终身学习综述论文

专知会员服务

44+阅读 · 2021年5月19日

【论文推荐】层次知识图谱，Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks

【论文推荐】层次知识图谱，Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks

专知会员服务

49+阅读 · 2020年5月26日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

专知会员服务

107+阅读 · 2020年2月22日

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

专知会员服务

85+阅读 · 2020年1月15日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

机器学习相关资源(框架、库、软件)大列表

机器学习相关资源(框架、库、软件)大列表

专知会员服务

40+阅读 · 2019年10月9日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新七篇推荐系统相关论文—影响兴趣、知识Embeddings、音乐推荐、非结构化、一致性、显式和隐式特征、知识图谱

【论文推荐】最新七篇推荐系统相关论文—影响兴趣、知识Embeddings、音乐推荐、非结构化、一致性、显式和隐式特征、知识图谱

专知

14+阅读 · 2018年3月28日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

异质性动态随机一般均衡框架下我国遗产税开征问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

超图的张量表示及其谱理论研究

国家自然科学基金

2+阅读 · 2014年12月31日

白令海东南部末次冰期以来的表层海水环境和陆源输入的变化

国家自然科学基金

0+阅读 · 2013年12月31日

多尺度地图数据间不一致性同化建模与处理方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于生态理性的跨期环境风险认知的主观概率和效用研究

国家自然科学基金

0+阅读 · 2012年12月31日

演化信息驱动的软件质量改善研究

国家自然科学基金

1+阅读 · 2012年12月31日

RNF4的磷酸化修饰在不同细胞周期中对DNA损伤应答的影响与机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于Linked Open Data的Web服务语义互操作关键技术

国家自然科学基金

0+阅读 · 2012年12月31日

制度环境、公司财务政策选择和动态演化研究

国家自然科学基金

0+阅读 · 2011年12月31日

不同类型强心苷抗肿瘤活性的研究

国家自然科学基金

0+阅读 · 2009年12月31日

GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking

Arxiv

1+阅读 · 2023年5月24日

Large Language Models are Better Reasoners with Self-Verification

Arxiv

0+阅读 · 2023年5月24日

A New Era in Software Security: Towards Self-Healing Software via Large Language Models and Formal Verification

Arxiv

0+阅读 · 2023年5月24日

Evaluation of African American Language Bias in Natural Language Generation

Arxiv

0+阅读 · 2023年5月23日

Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models

Arxiv

0+阅读 · 2023年5月23日

SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue in Multiple Domains

Arxiv

0+阅读 · 2023年5月22日

HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

Arxiv

0+阅读 · 2023年5月22日

The Scope of ChatGPT in Software Engineering: A Thorough Investigation

Arxiv

0+阅读 · 2023年5月20日

HELMA: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

Arxiv

0+阅读 · 2023年5月19日

CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing

Arxiv

0+阅读 · 2023年5月19日

VIP会员

文章信息

相关主题

相关VIP内容

Chem Sci｜用于药物-药物相互作用预测的子结构感知图神经网络

Chem Sci｜用于药物-药物相互作用预测的子结构感知图神经网络

专知会员服务

14+阅读 · 2022年12月19日

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

71页PDF，Intro to the Metaverse（元宇宙概念发展透析），Newzoo Trend Report 2021

专知会员服务

22+阅读 · 2022年2月19日

神经网络的持续终身学习综述论文

专知会员服务

44+阅读 · 2021年5月19日

【论文推荐】层次知识图谱，Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks

【论文推荐】层次知识图谱，Hierarchical Knowledge Graphs: A Novel Information Representation for Exploratory Search Tasks

专知会员服务

49+阅读 · 2020年5月26日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

【Mila-Google】使用元学习动态调整源代码模型，On-the-Fly Adaptation of Source Code Models using Meta-Learning

专知会员服务

21+阅读 · 2020年3月28日

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

【2020新书】算法与数据结构实战，286页pdf，Algorithms Data Structures in Action

专知会员服务

107+阅读 · 2020年2月22日

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

专知会员服务

85+阅读 · 2020年1月15日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

机器学习相关资源(框架、库、软件)大列表

机器学习相关资源(框架、库、软件)大列表

专知会员服务

40+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新七篇推荐系统相关论文—影响兴趣、知识Embeddings、音乐推荐、非结构化、一致性、显式和隐式特征、知识图谱

【论文推荐】最新七篇推荐系统相关论文—影响兴趣、知识Embeddings、音乐推荐、非结构化、一致性、显式和隐式特征、知识图谱

专知

14+阅读 · 2018年3月28日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking

Arxiv

1+阅读 · 2023年5月24日

Large Language Models are Better Reasoners with Self-Verification

Arxiv

0+阅读 · 2023年5月24日

A New Era in Software Security: Towards Self-Healing Software via Large Language Models and Formal Verification

Arxiv

0+阅读 · 2023年5月24日

Evaluation of African American Language Bias in Natural Language Generation

Arxiv

0+阅读 · 2023年5月23日

Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models

Arxiv

0+阅读 · 2023年5月23日

SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue in Multiple Domains

Arxiv

0+阅读 · 2023年5月22日

HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

Arxiv

0+阅读 · 2023年5月22日

The Scope of ChatGPT in Software Engineering: A Thorough Investigation

Arxiv

0+阅读 · 2023年5月20日

HELMA: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

Arxiv

0+阅读 · 2023年5月19日

CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing

Arxiv

0+阅读 · 2023年5月19日

相关基金

异质性动态随机一般均衡框架下我国遗产税开征问题研究

国家自然科学基金

0+阅读 · 2014年12月31日

超图的张量表示及其谱理论研究

国家自然科学基金

2+阅读 · 2014年12月31日

白令海东南部末次冰期以来的表层海水环境和陆源输入的变化

国家自然科学基金

0+阅读 · 2013年12月31日

多尺度地图数据间不一致性同化建模与处理方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于生态理性的跨期环境风险认知的主观概率和效用研究

国家自然科学基金

0+阅读 · 2012年12月31日

演化信息驱动的软件质量改善研究

国家自然科学基金

1+阅读 · 2012年12月31日

RNF4的磷酸化修饰在不同细胞周期中对DNA损伤应答的影响与机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于Linked Open Data的Web服务语义互操作关键技术

国家自然科学基金

0+阅读 · 2012年12月31日

制度环境、公司财务政策选择和动态演化研究

国家自然科学基金

0+阅读 · 2011年12月31日

不同类型强心苷抗肿瘤活性的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员