关于开放源码爪哇项目中代码克隆的性质 (On the Nature of Code Cloning in Open-Source Java Projects) - 专知论文

会员服务 ·

0

Java · 确切的 · Engineering · Processing（编程语言） · 数据集 ·

2021 年 7 月 9 日

On the Nature of Code Cloning in Open-Source Java Projects

翻译：关于开放源码爪哇项目中代码克隆的性质

Yaroslav Golubev,Timofey Bryksin

from arxiv, 7 pages, 8 figures

Code cloning plays a very important role in open-source software engineering. The presence of clones within a project may indicate a need for refactoring, and clones between projects are even more interesting, since code migration takes place and violations are possible. But how is code being copied? How prevalent is the process and on what level does it happen? In this general study, we attempt to shed some light on these questions by searching for clones in a large dataset of over 23 thousand Java projects on the level of both files and methods, and by studying the code fragments themselves and their clone pairs. We study the size and the age of code fragments, the prevalence of their clones, relationships between exact and non-exact clones, as well as between method-level and file-level clones. We also discover and describe various anomalies in the code clones that we discover. Our research shows that the copying occurs all through the years of the Java code existence and that method-level copying is much more prevalent than file-level copying: only 35.4% of methods have no clones. Additionally, some of the discovered anomalies can be useful for future large-scale cloning research as they can be used for removing auto-generated code.

翻译：克隆代码在开放源码软件工程中起着非常重要的作用。在一个项目中,克隆人的存在可能表明需要重新设定,项目之间的克隆更有趣,因为代码迁移和违规是可能的。但是,代码是如何复制的? 代码是如何复制的? 过程及其发生的程度有多普遍? 在这项一般性研究中,我们试图通过在文件和方法层面的23 000多个爪哇项目的大型数据集中搜索克隆,以及通过研究代码碎片本身及其克隆配对,来对这些问题做一些说明。我们研究了代码碎片的大小和年龄、其克隆的流行程度、精确克隆和非精确克隆人之间的关系以及方法层次和文件层次克隆人之间的关系。我们还发现并描述了我们发现的代码克隆人中的各种异常现象。我们的研究显示,在爪哇代码存在多年后,复制过程一直持续着,方法层次的复制比文件层次复制要普遍得多:只有35.4%的方法没有克隆。此外,一些已发现的异常现象可以用于未来大规模克隆研究。

0

相关内容

Java

Java 是一门编程语言，拥有跨平台、面向对象、泛型编程等特性。

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【2019/2020之交的机器学习/深度学习技术概述】《2019 In-Review and Trends for 2020 – A Technical Overview of Machine Learning and Deep Learning!》by Analytics Vidhya

【2019/2020之交的机器学习/深度学习技术概述】《2019 In-Review and Trends for 2020 – A Technical Overview of Machine Learning and Deep Learning!》by Analytics Vidhya

专知会员服务

21+阅读 · 2020年2月1日

【CAAI 2019】XLNet and Beyond，杨植麟，联合创始人，循环智能（Recurrent AI）

【CAAI 2019】XLNet and Beyond，杨植麟，联合创始人，循环智能（Recurrent AI）

专知会员服务

14+阅读 · 2019年12月4日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

jwt_tooll 一款针对JSON Web Tokens的测试工具

jwt_tooll 一款针对JSON Web Tokens的测试工具

黑白之道

6+阅读 · 2019年7月9日

CCF C类 | DSAA 2019 诚邀稿件

CCF C类 | DSAA 2019 诚邀稿件

Call4Papers

6+阅读 · 2019年5月13日

CCF A类 | 顶级会议RTSS 2019诚邀稿件

CCF A类 | 顶级会议RTSS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年4月17日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

Windows 提权-快速查找 Exp

Windows 提权-快速查找 Exp

黑白之道

3+阅读 · 2019年1月23日

2018年中科院JCR分区发布！

2018年中科院JCR分区发布！

材料科学与工程

3+阅读 · 2018年12月11日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

How the adoption of feature toggles correlates with branch merges and defects in open-source projects?

Arxiv

0+阅读 · 2021年9月12日

Box Embeddings: An open-source library for representation learning using geometric structures

Box Embeddings: An open-source library for representation learning using geometric structures

Arxiv

0+阅读 · 2021年9月10日

Smart Automotive Technology Adherence to the Law: (De)Constructing Road Rules for Autonomous System Development, Verification and Safety

Arxiv

0+阅读 · 2021年9月10日

DRIFT: A Toolkit for Diachronic Analysis of Scientific Literature

DRIFT: A Toolkit for Diachronic Analysis of Scientific Literature

Arxiv

0+阅读 · 2021年9月9日

Mapping Research Topics in Software Testing: A Bibliometric Analysis

Arxiv

0+阅读 · 2021年9月9日

OpenClinicalAI: enabling AI to diagnose diseases in real-world clinical settings

Arxiv

0+阅读 · 2021年9月9日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

AutoML: A Survey of the State-of-the-Art

AutoML: A Survey of the State-of-the-Art

Arxiv

74+阅读 · 2019年8月14日

Recommendation Systems for Tourism Based on Social Networks: A Survey

Recommendation Systems for Tourism Based on Social Networks: A Survey

Arxiv

3+阅读 · 2019年3月28日

Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks

Arxiv

3+阅读 · 2018年6月6日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【2019/2020之交的机器学习/深度学习技术概述】《2019 In-Review and Trends for 2020 – A Technical Overview of Machine Learning and Deep Learning!》by Analytics Vidhya

【2019/2020之交的机器学习/深度学习技术概述】《2019 In-Review and Trends for 2020 – A Technical Overview of Machine Learning and Deep Learning!》by Analytics Vidhya

专知会员服务

21+阅读 · 2020年2月1日

【CAAI 2019】XLNet and Beyond，杨植麟，联合创始人，循环智能（Recurrent AI）

【CAAI 2019】XLNet and Beyond，杨植麟，联合创始人，循环智能（Recurrent AI）

专知会员服务

14+阅读 · 2019年12月4日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《使用量化测量将传感器节点关联到融合中心的算法设计》171页

军事前沿模型

提升军事训练能力的最佳人工智能模拟工具

《社交媒体信息作战》最新48页技术报告

相关资讯

jwt_tooll 一款针对JSON Web Tokens的测试工具

jwt_tooll 一款针对JSON Web Tokens的测试工具

黑白之道

6+阅读 · 2019年7月9日

CCF C类 | DSAA 2019 诚邀稿件

CCF C类 | DSAA 2019 诚邀稿件

Call4Papers

6+阅读 · 2019年5月13日

CCF A类 | 顶级会议RTSS 2019诚邀稿件

CCF A类 | 顶级会议RTSS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年4月17日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

Windows 提权-快速查找 Exp

Windows 提权-快速查找 Exp

黑白之道

3+阅读 · 2019年1月23日

2018年中科院JCR分区发布！

2018年中科院JCR分区发布！

材料科学与工程

3+阅读 · 2018年12月11日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

相关论文

How the adoption of feature toggles correlates with branch merges and defects in open-source projects?

Arxiv

0+阅读 · 2021年9月12日

Box Embeddings: An open-source library for representation learning using geometric structures

Box Embeddings: An open-source library for representation learning using geometric structures

Arxiv

0+阅读 · 2021年9月10日

Smart Automotive Technology Adherence to the Law: (De)Constructing Road Rules for Autonomous System Development, Verification and Safety

Arxiv

0+阅读 · 2021年9月10日

DRIFT: A Toolkit for Diachronic Analysis of Scientific Literature

DRIFT: A Toolkit for Diachronic Analysis of Scientific Literature

Arxiv

0+阅读 · 2021年9月9日

Mapping Research Topics in Software Testing: A Bibliometric Analysis

Arxiv

0+阅读 · 2021年9月9日

OpenClinicalAI: enabling AI to diagnose diseases in real-world clinical settings

Arxiv

0+阅读 · 2021年9月9日

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning

Arxiv

34+阅读 · 2019年10月24日

AutoML: A Survey of the State-of-the-Art

AutoML: A Survey of the State-of-the-Art

Arxiv

74+阅读 · 2019年8月14日

Recommendation Systems for Tourism Based on Social Networks: A Survey

Recommendation Systems for Tourism Based on Social Networks: A Survey

Arxiv

3+阅读 · 2019年3月28日

Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks

Arxiv

3+阅读 · 2018年6月6日

微信扫码咨询专知VIP会员