使用数据挖掘算法推荐源代码变更 (Leveraging Data Mining Algorithms to Recommend Source Code Changes) - 专知论文

会员服务 ·

0

挖掘算法 · 数据挖掘 · 使用数据 · 算法 · 算法推荐 ·

2023 年 4 月 29 日

Leveraging Data Mining Algorithms to Recommend Source Code Changes

翻译：使用数据挖掘算法推荐源代码变更

AmirHossein Naghshzan,Saeed Khalilazar,Pierre Poilane,Olga Baysal,Latifa Guerrouj,Foutse Khomh

Context: Recent research has used data mining to develop techniques that can guide developers through source code changes. To the best of our knowledge, very few studies have investigated data mining techniques and--or compared their results with other algorithms or a baseline. Objectives: This paper proposes an automatic method for recommending source code changes using four data mining algorithms. We not only use these algorithms to recommend source code changes, but we also conduct an empirical evaluation. Methods: Our investigation includes seven open-source projects from which we extracted source change history at the file level. We used four widely data mining algorithms \ie{} Apriori, FP-Growth, Eclat, and Relim to compare the algorithms in terms of performance (Precision, Recall and F-measure) and execution time. Results: Our findings provide empirical evidence that while some Frequent Pattern Mining algorithms, such as Apriori may outperform other algorithms in some cases, the results are not consistent throughout all the software projects, which is more likely due to the nature and characteristics of the studied projects, in particular their change history. Conclusion: Apriori seems appropriate for large-scale projects, whereas Eclat appears to be suitable for small-scale projects. Moreover, FP-Growth seems an efficient approach in terms of execution time.

翻译：背景：最近的研究使用数据挖掘开发了可指导开发人员进行源代码变更的技术。据我们所知，很少有研究调查数据挖掘技术并/或将其结果与其他算法或基线进行比较。目标：本文提出了一种使用四种数据挖掘算法自动推荐源代码变更的方法。我们不仅使用这些算法来推荐源代码变更，而且我们还进行了实证评估。方法：我们的调查包括七个开源项目，从这些项目中以文件级别提取源变更历史记录。我们使用四种广泛应用的数据挖掘算法，即Apriori、FP-Growth、Eclat和Relim来比较这些算法在性能(Precision、Recall和F-measure)和执行时间方面的差异。结果：我们的研究结果提供了实证证据，证明一些频繁模式挖掘算法，如Apriori，在某些情况下可能优于其他算法，但结果并不始终如一，这更可能是由于所研究的项目的性质和特点，特别是它们的变更历史记录。结论：Apriori似乎适合大规模项目，而Eclat似乎适合小规模项目。此外，FP-Growth在执行时间方面似乎是一种高效的方法。

0

相关内容

挖掘算法

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

65+阅读 · 2023年2月15日

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

【RecSys22教程】多阶段推荐系统的神经重排序，90页ppt

【RecSys22教程】多阶段推荐系统的神经重排序，90页ppt

专知会员服务

27+阅读 · 2022年9月30日

如何使用TensorFlow 排序构建推荐系统? How to build a recommendation system using TensorFlow Ranking?

如何使用TensorFlow 排序构建推荐系统? How to build a recommendation system using TensorFlow Ranking?

专知会员服务

19+阅读 · 2022年3月13日

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

专知会员服务

30+阅读 · 2022年3月8日

【PKDD2020教程】可解释人工智能XAI:算法到应用，200页ppt

专知会员服务

41+阅读 · 2020年10月13日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【KDD2019|讲座推荐】工业中可解释的人工智能：Fake News Research: Theories, Detection Strategies, and Open Problems

专知会员服务

67+阅读 · 2019年12月9日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec智能推荐

50+阅读 · 2018年8月27日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

咖啡单核苷酸多态性分子标记的开发及遗传多样性分析

国家自然科学基金

1+阅读 · 2015年12月31日

贵州省汉族、苗族和布依族人群幽门螺杆菌耐药和毒力表型及其相关基因多态性

国家自然科学基金

0+阅读 · 2014年12月31日

基于协方差理论的UCT动态关联算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

多梳基因家族蛋白EZH2对鼻咽癌细胞的放射敏感性及其相关机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

海量高维天体光谱数据挖掘及其并行化研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

c-Myc调控的miRNA在结肠癌恶性行为中的作用及其机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于混合DNA策略的肺癌相关拷贝数变异的全基因组关联研究

国家自然科学基金

0+阅读 · 2011年12月31日

miR-124和miR-27对阿尔茨海默病BACE1基因影响的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

From BERT to GPT-3 Codex: Harnessing the Potential of Very Large Language Models for Data Management

Arxiv

0+阅读 · 2023年6月15日

Protecting User Privacy in Remote Conversational Systems: A Privacy-Preserving framework based on text sanitization

Arxiv

0+阅读 · 2023年6月14日

Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis

Arxiv

0+阅读 · 2023年6月13日

Deep Meta-learning in Recommendation Systems: A Survey

Arxiv

13+阅读 · 2022年6月9日

A Survey on Knowledge Graph-Based Recommender Systems

Arxiv

92+阅读 · 2020年2月28日

Text Classification Algorithms: A Survey

Arxiv

15+阅读 · 2019年6月25日

KGAT: Knowledge Graph Attention Network for Recommendation

Arxiv

40+阅读 · 2019年5月20日

Attention-based Group Recommendation

Arxiv

14+阅读 · 2018年4月18日

Ripple Network: Propagating User Preferences on the Knowledge Graph for Recommender Systems

Arxiv

12+阅读 · 2018年3月9日

Towards Understanding and Answering Multi-Sentence Recommendation Questions on Tourism

Arxiv

15+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

【干货书】数据分析优化，Optimization for Modern Data Analysis，117页pdf

专知会员服务

65+阅读 · 2023年2月15日

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

【RecSys22教程】多阶段推荐系统的神经重排序，90页ppt

【RecSys22教程】多阶段推荐系统的神经重排序，90页ppt

专知会员服务

27+阅读 · 2022年9月30日

如何使用TensorFlow 排序构建推荐系统? How to build a recommendation system using TensorFlow Ranking?

如何使用TensorFlow 排序构建推荐系统? How to build a recommendation system using TensorFlow Ranking?

专知会员服务

19+阅读 · 2022年3月13日

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

专知会员服务

30+阅读 · 2022年3月8日

【PKDD2020教程】可解释人工智能XAI:算法到应用，200页ppt

专知会员服务

41+阅读 · 2020年10月13日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【KDD2019|讲座推荐】工业中可解释的人工智能：Fake News Research: Theories, Detection Strategies, and Open Problems

专知会员服务

67+阅读 · 2019年12月9日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

最新《扩散模型原理》新书，470页pdf

无人机作战：演进、创新与未来战场

AI 智能体简史

多模态空间推理在大模型时代：综述与基准测试

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec 精选：基于LSTM的序列推荐实现（PyTorch）

LibRec智能推荐

50+阅读 · 2018年8月27日

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

【论文推荐】最新5篇行人再识别（ReID）相关论文—迁移学习、特征集成、重排序、多通道金字塔、深层生成模型

专知

12+阅读 · 2018年3月24日

【推荐】免费书(草稿)：数据科学的数学基础

【推荐】免费书(草稿)：数据科学的数学基础

机器学习研究会

20+阅读 · 2017年10月1日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

From BERT to GPT-3 Codex: Harnessing the Potential of Very Large Language Models for Data Management

Arxiv

0+阅读 · 2023年6月15日

Protecting User Privacy in Remote Conversational Systems: A Privacy-Preserving framework based on text sanitization

Arxiv

0+阅读 · 2023年6月14日

Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis

Arxiv

0+阅读 · 2023年6月13日

Deep Meta-learning in Recommendation Systems: A Survey

Arxiv

13+阅读 · 2022年6月9日

A Survey on Knowledge Graph-Based Recommender Systems

Arxiv

92+阅读 · 2020年2月28日

Text Classification Algorithms: A Survey

Arxiv

15+阅读 · 2019年6月25日

KGAT: Knowledge Graph Attention Network for Recommendation

Arxiv

40+阅读 · 2019年5月20日

Attention-based Group Recommendation

Arxiv

14+阅读 · 2018年4月18日

Ripple Network: Propagating User Preferences on the Knowledge Graph for Recommender Systems

Arxiv

12+阅读 · 2018年3月9日

Towards Understanding and Answering Multi-Sentence Recommendation Questions on Tourism

Arxiv

15+阅读 · 2018年1月5日

相关基金

咖啡单核苷酸多态性分子标记的开发及遗传多样性分析

国家自然科学基金

1+阅读 · 2015年12月31日

贵州省汉族、苗族和布依族人群幽门螺杆菌耐药和毒力表型及其相关基因多态性

国家自然科学基金

0+阅读 · 2014年12月31日

基于协方差理论的UCT动态关联算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

多梳基因家族蛋白EZH2对鼻咽癌细胞的放射敏感性及其相关机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

海量高维天体光谱数据挖掘及其并行化研究

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

c-Myc调控的miRNA在结肠癌恶性行为中的作用及其机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于混合DNA策略的肺癌相关拷贝数变异的全基因组关联研究

国家自然科学基金

0+阅读 · 2011年12月31日

miR-124和miR-27对阿尔茨海默病BACE1基因影响的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员