GloBug: 将全球数据用于错误本地化 (GloBug: Using Global Data in Fault Localization) - 专知论文

会员服务 ·

0

state-of-the-art · Automator · 均值 · INFORMS · CASES ·

2021 年 1 月 14 日

GloBug: Using Global Data in Fault Localization

翻译：GloBug: 将全球数据用于错误本地化

Nima Miryeganeh,Sepehr Hashtroudi,Hadi Hemmati

Fault Localization (FL) is an important first step in software debugging and is mostly manual in the current practice. Many methods have been proposed over years to automate the FL process, including information retrieval (IR)-based techniques. These methods localize the fault based on the similarity of the reported bug report and the source code. Newer variations of IR-based FL (IRFL) techniques also look into the history of bug reports and leverage them during the localization. However, all existing IRFL techniques limit themselves to the current project's data (local data). In this study, we introduce Globug, which is an IRFL framework consisting of methods that use models pre-trained on the global data (extracted from open-source benchmark projects). In Globug, we investigate two heuristics: a) the effect of global data on a state-of-the-art IR-FL technique, namely BugLocator, and b) the application of a Word Embedding technique (Doc2Vec) together with global data. Our large scale experiment on 51 software projects shows that using global data improves BugLocator on average 6.6% and 4.8% in terms of MRR (Mean Reciprocal Rank) and MAP (Mean Average Precision), with over 14% in a majority (64% and 54% in terms of MRR and MAP, respectively) of the cases. This amount of improvement is significant compared to the improvement rates that five other state-of-the-art IRFL tools provide over BugLocator. In addition, training the models globally is a one-time offline task with no overhead on BugLocator's run-time fault localization. Our study, however, shows that a Word Embedding-based global solution did not further improve the results.

翻译：错误本地化( FL) 是软件调试的重要第一步, 大多是当前实践中的手工操作。多年来, 提出了许多方法, 包括信息检索( IR) 技术, 包括信息检索( IR) 技术。这些方法根据报告的错误报告和源代码的相似性将错误本地化。以 IR 为基础的 FL ( IRFL) 技术的更新变异也查看了错误报告的历史, 并在本地化过程中加以利用。然而, 所有现有的 IRFL 技术都局限于当前项目的数据( 本地数据 ) 。在此研究中, 我们引入了 Globug, 这个由模型组成的IRL框架, 包括使用全球数据预先培训过的方法( 源自开源基准项目 ) 。在 Globbbbt 中, 全球数据数据对最新版本的 RFRF技术( 即 BugLocator ) 的影响, 以及应用Word Empload 改进工具( Doc2Vc) 工具, 以及全球数据。我们的大规模测试了51个软件模型( IMRBIL) 和 mLIL 的平均数据, 数据中, 这个系统运行中, 这个系统中的数据将显示一个百分比, 这个系统的数据为平均数据, 超过14 RBRBLLLLLLLLLL。

0

相关内容

state-of-the-art

state-of-the-art

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

专知会员服务

20+阅读 · 2020年6月23日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【论文翻译】2020最新预训练语言模型综述：Pre-trained Models for Natural Language Processing: A Survey

【论文翻译】2020最新预训练语言模型综述：Pre-trained Models for Natural Language Processing: A Survey

专知会员服务

94+阅读 · 2020年4月13日

【经典书】算法设计与分析，727页pdf，Algorithms Design and Analysis，牛津大学出版社

【经典书】算法设计与分析，727页pdf，Algorithms Design and Analysis，牛津大学出版社

专知会员服务

134+阅读 · 2020年2月25日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

127+阅读 · 2019年12月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

已删除

将门创投

5+阅读 · 2017年8月15日

GA for feature selection of EEG heterogeneous data

Arxiv

0+阅读 · 2021年3月12日

A Two-step Calibration Method for Unfocused Light Field Camera Based on Projection Model Analysis

Arxiv

0+阅读 · 2021年3月10日

COLA-Net: Collaborative Attention Network for Image Restoration

Arxiv

0+阅读 · 2021年3月10日

Equi-Joins Over Encrypted Data for Series of Queries

Arxiv

0+阅读 · 2021年3月10日

Overcoming Data Limitation in Medical Visual Question Answering

Overcoming Data Limitation in Medical Visual Question Answering

Arxiv

4+阅读 · 2019年9月26日

DC-SPP-YOLO: Dense Connection and Spatial Pyramid Pooling Based YOLO for Object Detection

DC-SPP-YOLO: Dense Connection and Spatial Pyramid Pooling Based YOLO for Object Detection

Arxiv

3+阅读 · 2019年3月20日

Efficient Parameter-free Clustering Using First Neighbor Relations

Efficient Parameter-free Clustering Using First Neighbor Relations

Arxiv

7+阅读 · 2019年2月28日

Red blood cell image generation for data augmentation using Conditional Generative Adversarial Networks

Red blood cell image generation for data augmentation using Conditional Generative Adversarial Networks

Arxiv

4+阅读 · 2019年1月18日

Scale-Aware Trident Networks for Object Detection

Scale-Aware Trident Networks for Object Detection

Arxiv

4+阅读 · 2019年1月7日

OMNIA Faster R-CNN: Detection in the wild through dataset merging and soft distillation

OMNIA Faster R-CNN: Detection in the wild through dataset merging and soft distillation

Arxiv

6+阅读 · 2018年12月6日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

专知会员服务

20+阅读 · 2020年6月23日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【论文翻译】2020最新预训练语言模型综述：Pre-trained Models for Natural Language Processing: A Survey

【论文翻译】2020最新预训练语言模型综述：Pre-trained Models for Natural Language Processing: A Survey

专知会员服务

94+阅读 · 2020年4月13日

【经典书】算法设计与分析，727页pdf，Algorithms Design and Analysis，牛津大学出版社

【经典书】算法设计与分析，727页pdf，Algorithms Design and Analysis，牛津大学出版社

专知会员服务

134+阅读 · 2020年2月25日

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

【Google可解释人工智能白皮书】27页pdf，AI Explainability Whitepaper ，Introduction to AI Explanations for AI Platform

专知会员服务

127+阅读 · 2019年12月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军特种作战条令》最新102页

《洛克希德SR-71“黑鸟”侦察机动力系统》21页slides

美空军作战实验室通过人工智能和指挥控制技术创新推进杀伤链

《指挥控制能力分析方法论》最新报告

相关资讯

已删除

将门创投

5+阅读 · 2017年8月15日

相关论文

GA for feature selection of EEG heterogeneous data

Arxiv

0+阅读 · 2021年3月12日

A Two-step Calibration Method for Unfocused Light Field Camera Based on Projection Model Analysis

Arxiv

0+阅读 · 2021年3月10日

COLA-Net: Collaborative Attention Network for Image Restoration

Arxiv

0+阅读 · 2021年3月10日

Equi-Joins Over Encrypted Data for Series of Queries

Arxiv

0+阅读 · 2021年3月10日

Overcoming Data Limitation in Medical Visual Question Answering

Overcoming Data Limitation in Medical Visual Question Answering

Arxiv

4+阅读 · 2019年9月26日

DC-SPP-YOLO: Dense Connection and Spatial Pyramid Pooling Based YOLO for Object Detection

DC-SPP-YOLO: Dense Connection and Spatial Pyramid Pooling Based YOLO for Object Detection

Arxiv

3+阅读 · 2019年3月20日

Efficient Parameter-free Clustering Using First Neighbor Relations

Efficient Parameter-free Clustering Using First Neighbor Relations

Arxiv

7+阅读 · 2019年2月28日

Red blood cell image generation for data augmentation using Conditional Generative Adversarial Networks

Red blood cell image generation for data augmentation using Conditional Generative Adversarial Networks

Arxiv

4+阅读 · 2019年1月18日

Scale-Aware Trident Networks for Object Detection

Scale-Aware Trident Networks for Object Detection

Arxiv

4+阅读 · 2019年1月7日

OMNIA Faster R-CNN: Detection in the wild through dataset merging and soft distillation

OMNIA Faster R-CNN: Detection in the wild through dataset merging and soft distillation

Arxiv

6+阅读 · 2018年12月6日

微信扫码咨询专知VIP会员