基于错误报告的根根根原因预测 (Root cause prediction based on bug reports) - 专知论文

会员服务 ·

0

Bug · Machine Learning · 线性支持向量机 · Processing（编程语言） · 情景 ·

2021 年 3 月 3 日

Root cause prediction based on bug reports

翻译：基于错误报告的根根根原因预测

Thomas Hirsch,Birgit Hofer

from arxiv, 6 pages

This paper proposes a supervised machine learning approach for predicting the root cause of a given bug report. Knowing the root cause of a bug can help developers in the debugging process - either directly or indirectly by choosing proper tool support for the debugging task. We mined 54755 closed bug reports from the issue trackers of 103 GitHub projects and applied a set of heuristics to create a benchmark consisting of 10459 reports. A subset was manually classified into three groups (semantic, memory, and concurrency) based on the bugs' root causes. Since the types of root cause are not equally distributed, a combination of keyword search and random selection was applied. Our data set for the machine learning approach consists of 369 bug reports (122 concurrency, 121 memory, and 126 semantic bugs). The bug reports are used as input to a natural language processing algorithm. We evaluated the performance of several classifiers for predicting the root causes for the given bug reports. Linear Support Vector machines achieved the highest mean precision (0.74) and recall (0.72) scores. The created bug data set and classification are publicly available.

翻译：本文提出一种监督的机器学习方法, 用于预测某个错误报告的根本原因。了解错误的根源可以帮助调试过程的开发者―― 直接或间接地选择调试任务的适当工具支持。我们从103 GitHub 项目的发行跟踪器中提取了54755份关闭的错误报告, 并运用一套超常学来创建由 10459 份报告组成的基准。一个子集被手工分类成基于错误根源原因的三个组( 静态、内存和同值计算 ) 。由于根源类型分布不均, 应用了关键词搜索和随机选择的组合。我们的机器学习方法数据集包含 369 个错误报告( 122 conconconconconform、 121 内存和 126 semantic 错误)。错误报告被用作自然语言处理算法的输入。我们评估了数个分类器的性能, 用于预测给定错误报告的根源。线性支持 Vector 机器达到了最高平均值 (0. 74) 并忆及 (0. 72) 。创建的错误数据集和分类是公开的。

0

相关内容

Bug

程序猿的天敌有时是一个不能碰的magic

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

专知会员服务

13+阅读 · 2020年6月10日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【AAAI 2019 Tutorial】超越单词的神经向量表示:句子和文档嵌入（Neural Vector Representations beyond Words: Sentence and Document Embeddings），Gerard de Melo

【AAAI 2019 Tutorial】超越单词的神经向量表示:句子和文档嵌入（Neural Vector Representations beyond Words: Sentence and Document Embeddings），Gerard de Melo

专知会员服务

19+阅读 · 2019年11月18日

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

专知会员服务

21+阅读 · 2019年11月11日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

CCF A类 | 顶级会议RTSS 2019诚邀稿件

CCF A类 | 顶级会议RTSS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年4月17日

人工智能 | ISAIR 2019诚邀稿件（推荐SCI期刊）

人工智能 | ISAIR 2019诚邀稿件（推荐SCI期刊）

Call4Papers

6+阅读 · 2019年4月1日

人工智能 | SCI期刊专刊/国际会议信息7条

人工智能 | SCI期刊专刊/国际会议信息7条

Call4Papers

7+阅读 · 2019年3月12日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

人工智能 | CCF推荐期刊专刊约稿信息6条

人工智能 | CCF推荐期刊专刊约稿信息6条

Call4Papers

5+阅读 · 2019年2月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

计算机 | CCF推荐会议信息10条

计算机 | CCF推荐会议信息10条

Call4Papers

5+阅读 · 2018年10月18日

人工智能类 | 国际会议/SCI期刊专刊信息9条

人工智能类 | 国际会议/SCI期刊专刊信息9条

Call4Papers

4+阅读 · 2018年7月10日

[DLdigest-8] 每日一道算法

[DLdigest-8] 每日一道算法

深度学习每日摘要

4+阅读 · 2017年11月2日

Model-based metrics: Sample-efficient estimates of predictive model subpopulation performance

Arxiv

0+阅读 · 2021年4月25日

Predicting the Number of Reported Bugs in a Software Repository

Arxiv

1+阅读 · 2021年4月24日

Assessing Validity of Static Analysis Warnings using Ensemble Learning

Arxiv

0+阅读 · 2021年4月21日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

A Survey of Adversarial Learning on Graphs

Arxiv

38+阅读 · 2020年3月10日

AutoML: A Survey of the State-of-the-Art

AutoML: A Survey of the State-of-the-Art

Arxiv

74+阅读 · 2019年8月14日

Link Prediction Based on Graph Neural Networks

Arxiv

26+阅读 · 2018年2月27日

Investigations on Knowledge Base Embedding for Relation Prediction and Extraction

Arxiv

8+阅读 · 2018年2月6日

A Comparison of Word Embeddings for the Biomedical Natural Language Processing

Arxiv

3+阅读 · 2018年2月1日

A Survey on Multi-Task Learning

Arxiv

5+阅读 · 2017年7月25日

VIP会员

文章信息

相关主题

Machine Learning

线性支持向量机

Processing（编程语言）

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

专知会员服务

13+阅读 · 2020年6月10日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【AAAI 2019 Tutorial】超越单词的神经向量表示:句子和文档嵌入（Neural Vector Representations beyond Words: Sentence and Document Embeddings），Gerard de Melo

【AAAI 2019 Tutorial】超越单词的神经向量表示:句子和文档嵌入（Neural Vector Representations beyond Words: Sentence and Document Embeddings），Gerard de Melo

专知会员服务

19+阅读 · 2019年11月18日

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

【Freddy Lecue博士】Thales嵌入式可解释AI：关键系统中AI的采用（Thales Embedded Explainable AI: Towards the Adoption of AI in Critical Systems.），AI Accelerator Summit 2019

专知会员服务

21+阅读 · 2019年11月11日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

CCF A类 | 顶级会议RTSS 2019诚邀稿件

CCF A类 | 顶级会议RTSS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年4月17日

人工智能 | ISAIR 2019诚邀稿件（推荐SCI期刊）

人工智能 | ISAIR 2019诚邀稿件（推荐SCI期刊）

Call4Papers

6+阅读 · 2019年4月1日

人工智能 | SCI期刊专刊/国际会议信息7条

人工智能 | SCI期刊专刊/国际会议信息7条

Call4Papers

7+阅读 · 2019年3月12日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

人工智能 | CCF推荐期刊专刊约稿信息6条

人工智能 | CCF推荐期刊专刊约稿信息6条

Call4Papers

5+阅读 · 2019年2月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

计算机 | CCF推荐会议信息10条

计算机 | CCF推荐会议信息10条

Call4Papers

5+阅读 · 2018年10月18日

人工智能类 | 国际会议/SCI期刊专刊信息9条

人工智能类 | 国际会议/SCI期刊专刊信息9条

Call4Papers

4+阅读 · 2018年7月10日

[DLdigest-8] 每日一道算法

[DLdigest-8] 每日一道算法

深度学习每日摘要

4+阅读 · 2017年11月2日

相关论文

Model-based metrics: Sample-efficient estimates of predictive model subpopulation performance

Arxiv

0+阅读 · 2021年4月25日

Predicting the Number of Reported Bugs in a Software Repository

Arxiv

1+阅读 · 2021年4月24日

Assessing Validity of Static Analysis Warnings using Ensemble Learning

Arxiv

0+阅读 · 2021年4月21日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

A Survey of Adversarial Learning on Graphs

Arxiv

38+阅读 · 2020年3月10日

AutoML: A Survey of the State-of-the-Art

AutoML: A Survey of the State-of-the-Art

Arxiv

74+阅读 · 2019年8月14日

Link Prediction Based on Graph Neural Networks

Arxiv

26+阅读 · 2018年2月27日

Investigations on Knowledge Base Embedding for Relation Prediction and Extraction

Arxiv

8+阅读 · 2018年2月6日

A Comparison of Word Embeddings for the Biomedical Natural Language Processing

Arxiv

3+阅读 · 2018年2月1日

A Survey on Multi-Task Learning

Arxiv

5+阅读 · 2017年7月25日

微信扫码咨询专知VIP会员