自动修复代码缺陷的方法：结合静态分析工具与机器学习 (Leveraging Static Analysis for Bug Repair) - 专知论文

会员服务 ·

0

静态分析 · 分析工具 · 代码 · 输出 · 工具 ·

2023 年 4 月 20 日

Leveraging Static Analysis for Bug Repair

翻译：自动修复代码缺陷的方法：结合静态分析工具与机器学习

Ruba Mutasim,Gabriel Synnaeve,David Pichardie,Baptiste Rozière

from arxiv, 13 pages. DL4C 2023

We propose a method combining machine learning with a static analysis tool (i.e. Infer) to automatically repair source code. Machine Learning methods perform well for producing idiomatic source code. However, their output is sometimes difficult to trust as language models can output incorrect code with high confidence. Static analysis tools are trustable, but also less flexible and produce non-idiomatic code. In this paper, we propose to fix resource leak bugs in IR space, and to use a sequence-to-sequence model to propose fix in source code space. We also study several decoding strategies, and use Infer to filter the output of the model. On a dataset of CodeNet submissions with potential resource leak bugs, our method is able to find a function with the same semantics that does not raise a warning with around 97% precision and 66% recall.

翻译：我们提出了一种结合机器学习和静态分析工具（即Infer）的方法，用于自动修复源代码。机器学习方法在生成成惯用语源代码方面效果良好。但是，由于语言模型可能会以高置信度输出不正确的代码，因此其输出有时很难信任。静态分析工具可信性高，但灵活性较差且产生的代码不是惯用语。在本文中，我们建议在IR空间中修复资源泄漏缺陷，并使用序列到序列模型在源代码空间中提出修复建议。我们还研究了几种解码策略，并使用Infer来过滤模型输出。在CodeNet提交的缺陷数据集中，我们的方法能够找到一种具有相同语义但不会引发警告的函数，准确率约为97％，召回率为66％。

0

相关内容

静态分析

【干货书】基于统计和机器学习的实用时间序列分析预测，Practical Time Series Analysis Prediction with Statistics & Machine Learning

【干货书】基于统计和机器学习的实用时间序列分析预测，Practical Time Series Analysis Prediction with Statistics & Machine Learning

专知会员服务

143+阅读 · 2022年4月8日

近期必读的六篇AAAI 2021【因果推理】相关论文和代码

专知会员服务

73+阅读 · 2021年1月12日

近期必读的五篇 EMNLP 2020【反事实推理】相关论文和代码

近期必读的五篇 EMNLP 2020【反事实推理】相关论文和代码

专知会员服务

26+阅读 · 2020年11月23日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

Tensorflow GNN实战：手把手教你使用tf_geometric构建图自编码器GAE（附完整代码）

Tensorflow GNN实战：手把手教你使用tf_geometric构建图自编码器GAE（附完整代码）

专知会员服务

76+阅读 · 2020年1月30日

【论文】利用Python开发长短时记忆网络，利用深度学习开发序列预测模型（Long Short-Term Memory Networks With Python，Develop Sequence Prediction Models With Deep Learning），246页pdf

【论文】利用Python开发长短时记忆网络，利用深度学习开发序列预测模型（Long Short-Term Memory Networks With Python，Develop Sequence Prediction Models With Deep Learning），246页pdf

专知会员服务

52+阅读 · 2020年1月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

手把手教你写 Dart ffi

手把手教你写 Dart ffi

阿里技术

0+阅读 · 2022年11月7日

用 20+ 行 JavaScript 代码，短暂“变身” iOS 程序员！

用 20+ 行 JavaScript 代码，短暂“变身” iOS 程序员！

CSDN

0+阅读 · 2022年9月7日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

手把手教你用RNN做情感分析—初学者指南（附代码）

手把手教你用RNN做情感分析—初学者指南（附代码）

专知

14+阅读 · 2018年7月16日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

手把手教你用Python库Keras做预测（附代码）

手把手教你用Python库Keras做预测（附代码）

数据派THU

14+阅读 · 2018年5月30日

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

机器学习研究会

11+阅读 · 2018年1月14日

【推荐】(TensorFlow)SSD实时手部检测与追踪（附代码）

【推荐】(TensorFlow)SSD实时手部检测与追踪（附代码）

机器学习研究会

11+阅读 · 2017年12月5日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

面向Bug报告的软件故障重现方法研究

国家自然科学基金

4+阅读 · 2015年12月31日

基于反模式自动检测的代码质量分析与重构

国家自然科学基金

0+阅读 · 2014年12月31日

操作剖面驱动的软件故障上下文识别方法

国家自然科学基金

0+阅读 · 2013年12月31日

恶意软件静态分析与检测关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

支撑统计故障定位的测试技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于轻量级虚拟机的全系统程序分析

国家自然科学基金

0+阅读 · 2012年12月31日

对象模型上交互式修复生成技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

CuInGaSe2太阳能电池界面结构、界面态及其钝化

国家自然科学基金

0+阅读 · 2012年12月31日

数据挖掘和静态分析相结合的重复代码缺陷检测及重构方法

国家自然科学基金

1+阅读 · 2010年12月31日

基于多版本技术的自适应编译优化方法研究

国家自然科学基金

0+阅读 · 2008年12月31日

Predicting the Next Action by Modeling the Abstract Goal

Arxiv

0+阅读 · 2023年6月6日

Which Argumentative Aspects of Hate Speech in Social Media can be reliably identified?

Arxiv

0+阅读 · 2023年6月5日

Conformal Prediction with Missing Values

Arxiv

0+阅读 · 2023年6月5日

Understanding and Supporting Debugging Workflows in Multiverse Analysis

Arxiv

0+阅读 · 2023年6月4日

Auditing for Human Expertise

Arxiv

0+阅读 · 2023年6月2日

Automatic Translation of Hate Speech to Non-hate Speech in Social Media Texts

Arxiv

0+阅读 · 2023年6月2日

Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models

Arxiv

0+阅读 · 2023年6月1日

Examining the Causal Effect of First Names on Language Models: The Case of Social Commonsense Reasoning

Arxiv

0+阅读 · 2023年6月1日

Active Countermeasures for Email Fraud

Arxiv

0+阅读 · 2023年6月1日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】基于统计和机器学习的实用时间序列分析预测，Practical Time Series Analysis Prediction with Statistics & Machine Learning

【干货书】基于统计和机器学习的实用时间序列分析预测，Practical Time Series Analysis Prediction with Statistics & Machine Learning

专知会员服务

143+阅读 · 2022年4月8日

近期必读的六篇AAAI 2021【因果推理】相关论文和代码

专知会员服务

73+阅读 · 2021年1月12日

近期必读的五篇 EMNLP 2020【反事实推理】相关论文和代码

近期必读的五篇 EMNLP 2020【反事实推理】相关论文和代码

专知会员服务

26+阅读 · 2020年11月23日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

Tensorflow GNN实战：手把手教你使用tf_geometric构建图自编码器GAE（附完整代码）

Tensorflow GNN实战：手把手教你使用tf_geometric构建图自编码器GAE（附完整代码）

专知会员服务

76+阅读 · 2020年1月30日

【论文】利用Python开发长短时记忆网络，利用深度学习开发序列预测模型（Long Short-Term Memory Networks With Python，Develop Sequence Prediction Models With Deep Learning），246页pdf

【论文】利用Python开发长短时记忆网络，利用深度学习开发序列预测模型（Long Short-Term Memory Networks With Python，Develop Sequence Prediction Models With Deep Learning），246页pdf

专知会员服务

52+阅读 · 2020年1月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

手把手教你写 Dart ffi

手把手教你写 Dart ffi

阿里技术

0+阅读 · 2022年11月7日

用 20+ 行 JavaScript 代码，短暂“变身” iOS 程序员！

用 20+ 行 JavaScript 代码，短暂“变身” iOS 程序员！

CSDN

0+阅读 · 2022年9月7日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

手把手教你用RNN做情感分析—初学者指南（附代码）

手把手教你用RNN做情感分析—初学者指南（附代码）

专知

14+阅读 · 2018年7月16日

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

【论文推荐】最新六篇主题模型相关论文—领域特定知识库、神经变分推断、动态和静态主题模型

专知

19+阅读 · 2018年6月26日

手把手教你用Python库Keras做预测（附代码）

手把手教你用Python库Keras做预测（附代码）

数据派THU

14+阅读 · 2018年5月30日

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

机器学习研究会

11+阅读 · 2018年1月14日

【推荐】(TensorFlow)SSD实时手部检测与追踪（附代码）

【推荐】(TensorFlow)SSD实时手部检测与追踪（附代码）

机器学习研究会

11+阅读 · 2017年12月5日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

Predicting the Next Action by Modeling the Abstract Goal

Arxiv

0+阅读 · 2023年6月6日

Which Argumentative Aspects of Hate Speech in Social Media can be reliably identified?

Arxiv

0+阅读 · 2023年6月5日

Conformal Prediction with Missing Values

Arxiv

0+阅读 · 2023年6月5日

Understanding and Supporting Debugging Workflows in Multiverse Analysis

Arxiv

0+阅读 · 2023年6月4日

Auditing for Human Expertise

Arxiv

0+阅读 · 2023年6月2日

Automatic Translation of Hate Speech to Non-hate Speech in Social Media Texts

Arxiv

0+阅读 · 2023年6月2日

Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models

Arxiv

0+阅读 · 2023年6月1日

Examining the Causal Effect of First Names on Language Models: The Case of Social Commonsense Reasoning

Arxiv

0+阅读 · 2023年6月1日

Active Countermeasures for Email Fraud

Arxiv

0+阅读 · 2023年6月1日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

相关基金

面向Bug报告的软件故障重现方法研究

国家自然科学基金

4+阅读 · 2015年12月31日

基于反模式自动检测的代码质量分析与重构

国家自然科学基金

0+阅读 · 2014年12月31日

操作剖面驱动的软件故障上下文识别方法

国家自然科学基金

0+阅读 · 2013年12月31日

恶意软件静态分析与检测关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

支撑统计故障定位的测试技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于轻量级虚拟机的全系统程序分析

国家自然科学基金

0+阅读 · 2012年12月31日

对象模型上交互式修复生成技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

CuInGaSe2太阳能电池界面结构、界面态及其钝化

国家自然科学基金

0+阅读 · 2012年12月31日

数据挖掘和静态分析相结合的重复代码缺陷检测及重构方法

国家自然科学基金

1+阅读 · 2010年12月31日

基于多版本技术的自适应编译优化方法研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员