关于开放源码爪哇项目中代码克隆的性质 (On the Nature of Code Cloning in Open-Source Java Projects) - 专知论文

会员服务 ·

0

Java · 数据集 · 确切的 · Engineering · Processing（编程语言） ·

2021 年 8 月 13 日

On the Nature of Code Cloning in Open-Source Java Projects

翻译：关于开放源码爪哇项目中代码克隆的性质

Yaroslav Golubev,Timofey Bryksin

from arxiv, 7 pages, 8 figures

Code cloning plays a very important role in open-source software engineering. The presence of clones within a project may indicate a need for refactoring, and clones between projects are even more interesting, since code migration takes place and violations are possible. But how is code being copied? How prevalent is the process and on what level does it happen? In this general study, we attempt to shed some light on these questions by searching for clones in a large dataset of over 23 thousand Java projects on the level of both files and methods, and by studying the code fragments themselves and their clone pairs. We study the size and the age of code fragments, the prevalence of their clones, relationships between exact and non-exact clones, as well as between method-level and file-level clones. We also discover and describe various anomalies in the code clones that were detected in the dataset. Our research shows that the copying occurs all through the years of the Java code existence and that method-level copying is much more prevalent than file-level copying, with only 35.4% of methods having no clones at all. Additionally, some of the discovered anomalies can be useful for future large-scale cloning research as they can be used for removing auto-generated code.

翻译：克隆代码在开放源码软件工程中起着非常重要的作用。在一个项目中存在克隆人可能表明需要重新设定,项目之间的克隆人甚至更加有趣,因为代码迁移和违规是可能的。但是,代码是如何复制的? 代码是如何复制的? 过程及其发生程度有多普遍? 在这项一般性研究中,我们试图通过在23 000多个爪哇项目的大型数据集中搜索克隆人,在文档和方法层面搜索23,000多个 Java项目,研究代码碎片本身及其克隆配对。我们研究了代码碎片的大小和年龄、其克隆的流行程度、其克隆人与非精密克隆人之间的关系以及方法层次和文件层次的克隆人之间的关系。我们还发现并描述了数据集中检测到的代码克隆人中的各种异常现象。我们的研究表明,复制过程在爪哇代码存在多年后就一直存在,方法层次复制比文件层次复制要普遍得多,只有35.4%的方法没有克隆人,而且只有35.4%的方法是完全没有克隆人的。此外,一些已发现的克隆人可以用来进行大规模复制。

0

相关内容

Java

Java 是一门编程语言，拥有跨平台、面向对象、泛型编程等特性。

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【电子书推荐】不切实际的Python项目，428页pdf，好玩的编程活动让你更聪明，Impractical Python Projects Playful Programming Activities to Make You Smarter

【电子书推荐】不切实际的Python项目，428页pdf，好玩的编程活动让你更聪明，Impractical Python Projects Playful Programming Activities to Make You Smarter

专知会员服务

84+阅读 · 2020年3月20日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

时序数据异常检测工具/数据集大列表

时序数据异常检测工具/数据集大列表

极市平台

65+阅读 · 2019年2月23日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

LibRec 精选：连通知识图谱与推荐系统

LibRec 精选：连通知识图谱与推荐系统

LibRec智能推荐

3+阅读 · 2018年8月9日

已删除

将门创投

5+阅读 · 2018年7月25日

Python机器学习教程资料/代码

Python机器学习教程资料/代码

机器学习研究会

8+阅读 · 2018年2月22日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

User-driven Design and Evaluation of Liquid Types in Java

Arxiv

0+阅读 · 2021年10月11日

Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition

Arxiv

0+阅读 · 2021年10月9日

Co-link analysis as a monitoring tool: A webometric use case to map the web relationships of research projects

Co-link analysis as a monitoring tool: A webometric use case to map the web relationships of research projects

Arxiv

0+阅读 · 2021年10月8日

Metadata Interpretation Driven Development

Metadata Interpretation Driven Development

Arxiv

0+阅读 · 2021年10月8日

On the feasibility of automated prediction of bug and non-bug issues

Arxiv

0+阅读 · 2021年10月8日

User Requirements for Software Game Process; An Empirical Investigation

Arxiv

0+阅读 · 2021年10月7日

GAN Inversion: A Survey

Arxiv

19+阅读 · 2021年1月14日

A survey of embedding models of entities and relationships for knowledge graph completion

Arxiv

23+阅读 · 2020年8月10日

AutoML: A Survey of the State-of-the-Art

AutoML: A Survey of the State-of-the-Art

Arxiv

75+阅读 · 2019年8月14日

Zero-Resource Neural Machine Translation with Multi-Agent Communication Game

Arxiv

4+阅读 · 2018年2月9日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【电子书推荐】不切实际的Python项目，428页pdf，好玩的编程活动让你更聪明，Impractical Python Projects Playful Programming Activities to Make You Smarter

【电子书推荐】不切实际的Python项目，428页pdf，好玩的编程活动让你更聪明，Impractical Python Projects Playful Programming Activities to Make You Smarter

专知会员服务

84+阅读 · 2020年3月20日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】基础模型训练中网络规模数据的负责任与高效使用

《俄乌战争背景下俄罗斯的战略性海军分析（2022-2025年）》最新100页报告

人工智能时代背景下的未来海战

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

时序数据异常检测工具/数据集大列表

时序数据异常检测工具/数据集大列表

极市平台

65+阅读 · 2019年2月23日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

LibRec 精选：连通知识图谱与推荐系统

LibRec 精选：连通知识图谱与推荐系统

LibRec智能推荐

3+阅读 · 2018年8月9日

已删除

将门创投

5+阅读 · 2018年7月25日

Python机器学习教程资料/代码

Python机器学习教程资料/代码

机器学习研究会

8+阅读 · 2018年2月22日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

User-driven Design and Evaluation of Liquid Types in Java

Arxiv

0+阅读 · 2021年10月11日

Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition

Arxiv

0+阅读 · 2021年10月9日

Co-link analysis as a monitoring tool: A webometric use case to map the web relationships of research projects

Co-link analysis as a monitoring tool: A webometric use case to map the web relationships of research projects

Arxiv

0+阅读 · 2021年10月8日

Metadata Interpretation Driven Development

Metadata Interpretation Driven Development

Arxiv

0+阅读 · 2021年10月8日

On the feasibility of automated prediction of bug and non-bug issues

Arxiv

0+阅读 · 2021年10月8日

User Requirements for Software Game Process; An Empirical Investigation

Arxiv

0+阅读 · 2021年10月7日

GAN Inversion: A Survey

Arxiv

19+阅读 · 2021年1月14日

A survey of embedding models of entities and relationships for knowledge graph completion

Arxiv

23+阅读 · 2020年8月10日

AutoML: A Survey of the State-of-the-Art

AutoML: A Survey of the State-of-the-Art

Arxiv

75+阅读 · 2019年8月14日

Zero-Resource Neural Machine Translation with Multi-Agent Communication Game

Arxiv

4+阅读 · 2018年2月9日

微信扫码咨询专知VIP会员