从自动静态分析工具中检测假警报:我们有多远? (Detecting False Alarms from Automatic Static Analysis Tools: How Far are We?) - 专知论文

会员服务 ·

0

Performer · TOOLS · Machine Learning · 可辨认的 · Better ·

2022 年 2 月 12 日

Detecting False Alarms from Automatic Static Analysis Tools: How Far are We?

翻译：从自动静态分析工具中检测假警报:我们有多远?

Hong Jin Kang,Khai Loong Aw,David Lo

from arxiv, Accepted to the Technical Track of ICSE 2022

Automatic static analysis tools (ASATs), such as Findbugs, have a high false alarm rate. The large number of false alarms produced poses a barrier to adoption. Researchers have proposed the use of machine learning to prune false alarms and present only actionable warnings to developers. The state-of-the-art study has identified a set of "Golden Features" based on metrics computed over the characteristics and history of the file, code, and warning. Recent studies show that machine learning using these features is extremely effective and that they achieve almost perfect performance. We perform a detailed analysis to better understand the strong performance of the "Golden Features". We found that several studies used an experimental procedure that results in data leakage and data duplication, which are subtle issues with significant implications. Firstly, the ground-truth labels have leaked into features that measure the proportion of actionable warnings in a given context. Secondly, many warnings in the testing dataset appear in the training dataset. Next, we demonstrate limitations in the warning oracle that determines the ground-truth labels, a heuristic comparing warnings in a given revision to a reference revision in the future. We show the choice of reference revision influences the warning distribution. Moreover, the heuristic produces labels that do not agree with human oracles. Hence, the strong performance of these techniques previously seen is overoptimistic of their true performance if adopted in practice. Our results convey several lessons and provide guidelines for evaluating false alarm detectors.

翻译：自动静态分析工具( ASATs), 如 Findbugs 等自动静态分析工具( ASATs) 具有很高的假警报率。大量生成的假警报构成了采用的障碍。研究人员已经提议使用机器学习来淡化假警报, 并向开发者只提供可操作的警告。最新的最新研究已经根据根据文件、代码和警告的特性和历史的量度计算出了一系列“ Golden 特征 ” 。最近的研究显示, 使用这些特征的机器学习非常有效, 并且取得了几乎完美的性能。我们进行了详细分析, 以更好地了解“ Golden 特性” 的强效性能。我们发现, 一些研究使用了实验性程序, 导致数据泄漏和数据重复, 这些都是具有重大影响的问题。首先, 地真真真假标签标签标签的特性已经渗漏了。测试数据集中的许多警告都出现在培训数据集中。其次, 我们展示了在确定地面标签标签的准确性参考标准中的局限性, 肝脏比较警告时, 我们没有在前期的判断中进行精确性评估, 。

0

相关内容

Performer

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

基于大数据的微观宏观行为综合分析

国家自然科学基金

1+阅读 · 2015年12月31日

Pictet–Spengler类反应机理的理论研究和新反应设计

国家自然科学基金

0+阅读 · 2013年12月31日

南极AST3大视场巡天及海量数据高精度测光

国家自然科学基金

0+阅读 · 2012年12月31日

基于GPGPU和LiDAR波形数据高精效提取三维地形的研究

国家自然科学基金

0+阅读 · 2012年12月31日

高分辨率卫星影像云自动提取的机器视觉方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于压缩感知的稀疏阵列MIMO-SAR成像及动目标检测

国家自然科学基金

0+阅读 · 2012年12月31日

SPARC在强直性脊柱炎发病中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

干旱诱导表达的苹果AsA转运蛋白功能和在抗逆中的作用分析

国家自然科学基金

0+阅读 · 2011年12月31日

《物理》期刊

国家自然科学基金

1+阅读 · 2009年12月31日

基于光学与微波遥感的水稻物候特征反演

国家自然科学基金

0+阅读 · 2008年12月31日

One-Class Model for Fabric Defect Detection

One-Class Model for Fabric Defect Detection

Arxiv

0+阅读 · 2022年4月20日

Cyber-Forensic Review of Human Footprint and Gait for Personal Identification

Arxiv

0+阅读 · 2022年4月20日

Trends, Limitations and Open Challenges in Automatic Readability Assessment Research

Arxiv

0+阅读 · 2022年4月19日

How are Software Repositories Mined? A Systematic Literature Review of Workflows, Methodologies, Reproducibility, and Tools

Arxiv

0+阅读 · 2022年4月17日

Detecting Violence in Video Based on Deep Features Fusion Technique

Detecting Violence in Video Based on Deep Features Fusion Technique

Arxiv

0+阅读 · 2022年4月15日

Scalable and Real-time Multi-Camera Vehicle Detection, Re-Identification, and Tracking

Arxiv

0+阅读 · 2022年4月15日

On the Importance of Firth Bias Reduction in Few-Shot Classification

Arxiv

0+阅读 · 2022年4月14日

Automatic Fake News Detection: Are current models "fact-checking" or "gut-checking"?

Arxiv

0+阅读 · 2022年4月14日

How Different are Pre-trained Transformers for Text Ranking?

Arxiv

0+阅读 · 2022年4月5日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

VIP会员

文章信息

相关主题

Machine Learning

相关VIP内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

One-Class Model for Fabric Defect Detection

One-Class Model for Fabric Defect Detection

Arxiv

0+阅读 · 2022年4月20日

Cyber-Forensic Review of Human Footprint and Gait for Personal Identification

Arxiv

0+阅读 · 2022年4月20日

Trends, Limitations and Open Challenges in Automatic Readability Assessment Research

Arxiv

0+阅读 · 2022年4月19日

How are Software Repositories Mined? A Systematic Literature Review of Workflows, Methodologies, Reproducibility, and Tools

Arxiv

0+阅读 · 2022年4月17日

Detecting Violence in Video Based on Deep Features Fusion Technique

Detecting Violence in Video Based on Deep Features Fusion Technique

Arxiv

0+阅读 · 2022年4月15日

Scalable and Real-time Multi-Camera Vehicle Detection, Re-Identification, and Tracking

Arxiv

0+阅读 · 2022年4月15日

On the Importance of Firth Bias Reduction in Few-Shot Classification

Arxiv

0+阅读 · 2022年4月14日

Automatic Fake News Detection: Are current models "fact-checking" or "gut-checking"?

Arxiv

0+阅读 · 2022年4月14日

How Different are Pre-trained Transformers for Text Ranking?

Arxiv

0+阅读 · 2022年4月5日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

相关基金

基于大数据的微观宏观行为综合分析

国家自然科学基金

1+阅读 · 2015年12月31日

Pictet–Spengler类反应机理的理论研究和新反应设计

国家自然科学基金

0+阅读 · 2013年12月31日

南极AST3大视场巡天及海量数据高精度测光

国家自然科学基金

0+阅读 · 2012年12月31日

基于GPGPU和LiDAR波形数据高精效提取三维地形的研究

国家自然科学基金

0+阅读 · 2012年12月31日

高分辨率卫星影像云自动提取的机器视觉方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于压缩感知的稀疏阵列MIMO-SAR成像及动目标检测

国家自然科学基金

0+阅读 · 2012年12月31日

SPARC在强直性脊柱炎发病中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

干旱诱导表达的苹果AsA转运蛋白功能和在抗逆中的作用分析

国家自然科学基金

0+阅读 · 2011年12月31日

《物理》期刊

国家自然科学基金

1+阅读 · 2009年12月31日

基于光学与微波遥感的水稻物候特征反演

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员