Repominer:地雷软件储存库的语言 -- -- 不可知的俾子体框架,用于破坏预测的地雷软件储存库 (RepoMiner: a Language-agnostic Python Framework to Mine Software Repositories for Defect Prediction) - 专知论文

会员服务 ·

0

MINE · MSR · INFORMS · 可理解性 · 真实值 ·

2021 年 11 月 23 日

RepoMiner: a Language-agnostic Python Framework to Mine Software Repositories for Defect Prediction

翻译：Repominer:地雷软件储存库的语言 -- -- 不可知的俾子体框架,用于破坏预测的地雷软件储存库

Stefano Dalla Palma,Dario Di Nucci,Damian Tamburri

Data originating from open-source software projects provide valuable information to enhance software quality. In the scope of Software Defect Prediction, one of the most challenging parts is extracting valid data about failure-prone software components from these repositories, which can help develop more robust software. In particular, collecting data, calculating metrics, and synthesizing results from these repositories is a tedious and error-prone task, which often requires understanding the programming languages involved in the mined repositories, eventually leading to a proliferation of language-specific data-mining software. This paper presents RepoMiner, a language-agnostic tool developed to support software engineering researchers in creating datasets to support any study on defect prediction. RepoMiner automatically collects failure data from software components, labels them as failure-prone or neutral, and calculates metrics to be used as ground truth for defect prediction models. We present its implementation and provide examples of its application.

翻译：开放源软件项目产生的数据为提高软件质量提供了宝贵的信息。在软件缺陷预测的范围内,最具挑战性的部分之一是从这些储存库中提取关于易发生故障的软件组件的有效数据,这些数据有助于开发更强的软件。特别是,收集数据、计算测量尺度和综合这些储存库的结果是一项乏味和容易出错的任务,这往往要求理解雷区中涉及的编程语言,最终导致语言特定数据挖掘软件的扩散。本文介绍了RepoMiner,这是为支持软件工程研究人员创建数据集以支持任何缺陷预测研究而开发的一种语言敏感工具。RepoMiner自动从软件组件中收集故障数据,将其标为易发生故障或中性数据,并计算指标,作为缺陷预测模型的地面真象。我们介绍其实施情况并提供应用实例。

0

相关内容

MINE

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【干货书】Python程序员编程，810页pdf，Python® for Programmers

【干货书】Python程序员编程，810页pdf，Python® for Programmers

专知会员服务

61+阅读 · 2020年8月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

专知会员服务

62+阅读 · 2019年10月26日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

计算机 | USENIX Security 2020等国际会议信息5条

计算机 | USENIX Security 2020等国际会议信息5条

Call4Papers

7+阅读 · 2019年4月25日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

机器学习线性代数速查

机器学习线性代数速查

机器学习研究会

19+阅读 · 2018年2月25日

Python机器学习教程资料/代码

Python机器学习教程资料/代码

机器学习研究会

8+阅读 · 2018年2月22日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

推荐｜Andrew Ng计算机视觉教程总结

推荐｜Andrew Ng计算机视觉教程总结

全球人工智能

3+阅读 · 2017年11月23日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

机器学习研究会

6+阅读 · 2017年8月23日

Leveraging Structural Properties of Source Code Graphs for Just-In-Time Bug Prediction

Arxiv

0+阅读 · 2022年1月25日

Blockchain for Future Smart Grid: A Comprehensive Survey

Blockchain for Future Smart Grid: A Comprehensive Survey

Arxiv

21+阅读 · 2019年11月8日

Measuring Sentences Similarity: A Survey

Arxiv

7+阅读 · 2019年10月6日

Representation Learning with Contrastive Predictive Coding

Arxiv

6+阅读 · 2019年1月22日

Towards security defect prediction with AI

Arxiv

3+阅读 · 2018年9月12日

Sparse and Constrained Attention for Neural Machine Translation

Arxiv

4+阅读 · 2018年5月21日

SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text

Arxiv

5+阅读 · 2018年5月18日

Predicting Cyber Events by Leveraging Hacker Sentiment

Arxiv

3+阅读 · 2018年4月14日

What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)

Arxiv

3+阅读 · 2018年2月20日

SimplE Embedding for Link Prediction in Knowledge Graphs

Arxiv

7+阅读 · 2018年2月13日

VIP会员

文章信息

相关主题

相关VIP内容

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【干货书】Python程序员编程，810页pdf，Python® for Programmers

【干货书】Python程序员编程，810页pdf，Python® for Programmers

专知会员服务

61+阅读 · 2020年8月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

专知会员服务

62+阅读 · 2019年10月26日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

数据驱动死亡：以色列AI战争机器如何锁定目标

【普林斯顿博士论文】通过以人为本的评估推动负责任的人工智能

ICML 2025 | BiAssemble: 双臂机器人几何拼合问题的协同可供性学习

ICML 2025杰出论文出炉：8篇获奖，南大研究者榜上有名

相关资讯

计算机 | USENIX Security 2020等国际会议信息5条

计算机 | USENIX Security 2020等国际会议信息5条

Call4Papers

7+阅读 · 2019年4月25日

计算机 | CCF推荐期刊专刊信息5条

计算机 | CCF推荐期刊专刊信息5条

Call4Papers

3+阅读 · 2019年4月10日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

机器学习线性代数速查

机器学习线性代数速查

机器学习研究会

19+阅读 · 2018年2月25日

Python机器学习教程资料/代码

Python机器学习教程资料/代码

机器学习研究会

8+阅读 · 2018年2月22日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

推荐｜Andrew Ng计算机视觉教程总结

推荐｜Andrew Ng计算机视觉教程总结

全球人工智能

3+阅读 · 2017年11月23日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

机器学习研究会

6+阅读 · 2017年8月23日

相关论文

Leveraging Structural Properties of Source Code Graphs for Just-In-Time Bug Prediction

Arxiv

0+阅读 · 2022年1月25日

Blockchain for Future Smart Grid: A Comprehensive Survey

Blockchain for Future Smart Grid: A Comprehensive Survey

Arxiv

21+阅读 · 2019年11月8日

Measuring Sentences Similarity: A Survey

Arxiv

7+阅读 · 2019年10月6日

Representation Learning with Contrastive Predictive Coding

Arxiv

6+阅读 · 2019年1月22日

Towards security defect prediction with AI

Arxiv

3+阅读 · 2018年9月12日

Sparse and Constrained Attention for Neural Machine Translation

Arxiv

4+阅读 · 2018年5月21日

SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text

Arxiv

5+阅读 · 2018年5月18日

Predicting Cyber Events by Leveraging Hacker Sentiment

Arxiv

3+阅读 · 2018年4月14日

What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)

Arxiv

3+阅读 · 2018年2月20日

SimplE Embedding for Link Prediction in Knowledge Graphs

Arxiv

7+阅读 · 2018年2月13日

微信扫码咨询专知VIP会员