ASTRO: 通用神经克隆探测的AST辅助方法 (ASTRO: An AST-Assisted Approach for Generalizable Neural Clone Detection) - 专知论文

会员服务 ·

0

查全率/召回率 · state-of-the-art · 讲稿 · MoDELS · 代码 ·

2022 年 8 月 17 日

ASTRO: An AST-Assisted Approach for Generalizable Neural Clone Detection

翻译：ASTRO: 通用神经克隆探测的AST辅助方法

Yifan Zhang,Junwen Yang,Haoyu Dong,Qingchen Wang,Huajie Shao,Kevin Leach,Yu Huang

Neural clone detection has attracted the attention of software engineering researchers and practitioners. However, most neural clone detection methods do not generalize beyond the scope of clones that appear in the training dataset. This results in poor model performance, especially in terms of model recall. In this paper, we present an Abstract Syntax Tree (AST) assisted approach for generalizable neural clone detection, or ASTRO, a framework for finding clones in codebases reflecting industry practices. We present three main components: (1) an AST-inspired representation for source code that leverages program structure and semantics, (2) a global graph representation that captures the context of an AST among a corpus of programs, and (3) a graph embedding for programs that, in combination with extant large-scale language models, improves state-of-the-art code clone detection. Our experimental results show that ASTRO improves state-of-the-art neural clone detection approaches in both recall and F-1 scores.

翻译：神经克隆探测吸引了软件工程研究人员和从业人员的注意,然而,大多数神经克隆探测方法并未超越培训数据集中出现的克隆的范围,造成模型性能差,特别是模型召回方面。在本文中,我们介绍了可通用神经克隆探测的简易语库辅助方法(AST),或ASTRO,这是在反映工业实践的代码库中寻找克隆的框架。我们介绍了三个主要组成部分:(1)由AST启发的源代码代表,该源代码利用了程序结构和语义学,(2)全球图形代表,将AST的背景包含在一系列程序之中,(3)程序图嵌入图集,与现有大规模语言模型相结合,改进最新技术的代码克隆探测。我们的实验结果表明,ASTRO改进了记忆和F-1分数中的最新神经克隆探测方法。

0

相关内容

查全率/召回率

查全率/召回率

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

量子码的构造

国家自然科学基金

1+阅读 · 2015年12月31日

Vaspin在胰岛β细胞炎症、胰岛素抵抗及氧化应激中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Catestatin蛋白肽段抑制动脉粥样硬化的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

有限长区域中的空间耦合多元Rateless码研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向E级计算的并行代数多重网格新型算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

闪锌矿成矿过程的Fe-Ca-Zn三元耦合系统

国家自然科学基金

0+阅读 · 2012年12月31日

随机泛函微分方程的渐近行为

国家自然科学基金

0+阅读 · 2012年12月31日

Gata6对血管损伤修复和动脉粥样硬化形成的作用及其机制

国家自然科学基金

0+阅读 · 2011年12月31日

氧化还原信号在SIRT1影响tat介导HIV转录激活的作用机制

国家自然科学基金

0+阅读 · 2008年12月31日

几类随机泛函微分方程数值方法的收敛性、稳定性和散逸性

国家自然科学基金

0+阅读 · 2008年12月31日

Explanations as Programs in Probabilistic Logic Programming

Arxiv

0+阅读 · 2022年10月6日

Just ClozE! A Fast and Simple Method for Evaluating the Factual Consistency in Abstractive Summarization

Arxiv

0+阅读 · 2022年10月6日

Towards Adversarially Robust Deepfake Detection: An Ensemble Approach

Arxiv

0+阅读 · 2022年10月5日

Capturing and Animation of Body and Clothing from Monocular Video

Arxiv

0+阅读 · 2022年10月4日

Reliable Face Morphing Attack Detection in On-The-Fly Border Control Scenario with Variation in Image Resolution and Capture Distance

Arxiv

0+阅读 · 2022年9月30日

A Novel Explainable Out-of-Distribution Detection Approach for Spiking Neural Networks

Arxiv

0+阅读 · 2022年9月30日

Your Out-of-Distribution Detection Method is Not Robust!

Arxiv

0+阅读 · 2022年9月30日

Learning an Efficient Terrain Representation for Haptic Localization of a Legged Robot

Arxiv

0+阅读 · 2022年9月29日

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Arxiv

11+阅读 · 2019年11月4日

Constructing Narrative Event Evolutionary Graph for Script Event Prediction

Arxiv

11+阅读 · 2018年5月16日

VIP会员

文章信息

相关主题

查全率/召回率

state-of-the-art

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【ICCV2025教程】基础模型遇见具身智能体

军事机器学习设计：关于开发自动化任务摘要系统的梯次化设计科学研究 | 2025最新93页

扩散模型中的缓存方法综述：迈向高效的多模态生成

【ICCV2025教程】《迈向视觉语言模型的全面推理》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Explanations as Programs in Probabilistic Logic Programming

Arxiv

0+阅读 · 2022年10月6日

Just ClozE! A Fast and Simple Method for Evaluating the Factual Consistency in Abstractive Summarization

Arxiv

0+阅读 · 2022年10月6日

Towards Adversarially Robust Deepfake Detection: An Ensemble Approach

Arxiv

0+阅读 · 2022年10月5日

Capturing and Animation of Body and Clothing from Monocular Video

Arxiv

0+阅读 · 2022年10月4日

Reliable Face Morphing Attack Detection in On-The-Fly Border Control Scenario with Variation in Image Resolution and Capture Distance

Arxiv

0+阅读 · 2022年9月30日

A Novel Explainable Out-of-Distribution Detection Approach for Spiking Neural Networks

Arxiv

0+阅读 · 2022年9月30日

Your Out-of-Distribution Detection Method is Not Robust!

Arxiv

0+阅读 · 2022年9月30日

Learning an Efficient Terrain Representation for Haptic Localization of a Legged Robot

Arxiv

0+阅读 · 2022年9月29日

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Arxiv

11+阅读 · 2019年11月4日

Constructing Narrative Event Evolutionary Graph for Script Event Prediction

Arxiv

11+阅读 · 2018年5月16日

相关基金

量子码的构造

国家自然科学基金

1+阅读 · 2015年12月31日

Vaspin在胰岛β细胞炎症、胰岛素抵抗及氧化应激中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Catestatin蛋白肽段抑制动脉粥样硬化的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

有限长区域中的空间耦合多元Rateless码研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向E级计算的并行代数多重网格新型算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

闪锌矿成矿过程的Fe-Ca-Zn三元耦合系统

国家自然科学基金

0+阅读 · 2012年12月31日

随机泛函微分方程的渐近行为

国家自然科学基金

0+阅读 · 2012年12月31日

Gata6对血管损伤修复和动脉粥样硬化形成的作用及其机制

国家自然科学基金

0+阅读 · 2011年12月31日

氧化还原信号在SIRT1影响tat介导HIV转录激活的作用机制

国家自然科学基金

0+阅读 · 2008年12月31日

几类随机泛函微分方程数值方法的收敛性、稳定性和散逸性

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员