确定日期:对方案拟订问题方案确定办法的基于执行的评价 (FixEval: Execution-based Evaluation of Program Fixes for Programming Problems) - 专知论文

会员服务 ·

0

可辨认的 · 代码 · Bug · MoDELS · INFORMS ·

2022 年 9 月 29 日

FixEval: Execution-based Evaluation of Program Fixes for Programming Problems

翻译：确定日期:对方案拟订问题方案确定办法的基于执行的评价

Md Mahim Anjum Haque,Wasi Uddin Ahmad,Ismini Lourentzou,Chris Brown

The increasing complexity of software has led to a drastic rise in time and costs for identifying and fixing bugs. Various approaches are explored in the literature to generate fixes for buggy code automatically. However, few tools and datasets are available to evaluate model-generated fixes effectively due to the large combinatorial space of possible fixes for a particular bug. In this work, we introduce FIXEVAL, a benchmark comprising buggy code submissions to competitive programming problems and their respective fixes. FIXEVAL is composed of a rich test suite to evaluate and assess the correctness of model-generated program fixes and further information regarding time and memory constraints and acceptance based on a verdict. We consider two Transformer language models pretrained on programming languages as our baselines and compare them using match-based and execution-based evaluation metrics. Our experiments show that match-based metrics do not reflect model-generated program fixes accurately. At the same time, execution-based methods evaluate programs through all cases and scenarios designed explicitly for that solution. Therefore, we believe FIXEVAL provides a step towards real-world automatic bug fixing and model-generated code evaluation. The dataset and models are open-sourced.\footnote{\url{https://github.com/mahimanzum/FixEval}}

翻译：软件日益复杂,导致识别和修正错误的时间和费用急剧上升。文献中探讨了各种办法,以自动修正错误代码。然而,由于对特定错误可能进行修正的庞大组合空间,很少有工具和数据集可用于有效评价模型生成的固定方法。在这项工作中,我们引入了FIXEVAL,这是一个基准,包括针对竞争性编程问题和各自的修正提交错误代码。FIXEVAL由一套丰富的测试套件组成,用来评价和评估模型生成程序修正的正确性,以及基于判定结果的关于时间和内存限制和接受的进一步信息。我们认为,有两个变换语言模型预先以编程语言作为我们的基线,并使用匹配基础和基于执行的评估尺度进行比较。我们的实验表明,基于匹配的参数并不准确反映模型生成的程序修正方法。与此同时,基于执行的方法通过明确为该解决方案设计的所有案例和情景来评估程序。因此,我们认为FIXEVAL提供了迈向真实世界自动错误修正和模型生成代码的模型评估的一步。数据设置和模型是开放源码/ASGU/ANSO。和模型是开放的。

0

相关内容

可辨认的

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

TRAP1在赭曲霉毒素A干扰肾细胞凋亡与自噬内稳态中的作用机制

国家自然科学基金

0+阅读 · 2014年12月31日

广义线性模型的组变量选择及其在信用评分中的应用

国家自然科学基金

2+阅读 · 2014年12月31日

BER通路基因miRNA结合位点基因多态性与结直肠癌易感性的关联及功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

能量耦合因子转运蛋白结构与功能的研究

国家自然科学基金

0+阅读 · 2013年12月31日

RERT-lncRNA调控EGLN2在肝细胞肝癌发生中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于EEG和fNIRS的多模态脑机接口运动想象参数研究

国家自然科学基金

1+阅读 · 2012年12月31日

Prohibitin调控癌组织内源性雄激素合成促进前列腺癌激素抵抗性进展机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于物理空间与网络空间的社会事件分析的研究

国家自然科学基金

0+阅读 · 2012年12月31日

混合存储结构感知的并行文件系统关键技术

国家自然科学基金

1+阅读 · 2011年12月31日

Pim激酶家族参与Abl介导细胞转化的分子机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments

Arxiv

0+阅读 · 2022年11月7日

A Synthesis-Based Approach for Thermal-to-Visible Face Verification

Arxiv

0+阅读 · 2022年11月6日

Understanding the properties and limitations of contrastive learning for Out-of-Distribution detection

Arxiv

0+阅读 · 2022年11月6日

Random Test Generation of Application Programming Interfaces

Arxiv

0+阅读 · 2022年11月6日

Development and evaluation of automated localization and reconstruction of all fruits on tomato plants in a greenhouse based on multi-view perception and 3D multi-object tracking

Arxiv

0+阅读 · 2022年11月4日

Modeling and Executing Production Processes with Capabilities and Skills using Ontologies and BPMN

Arxiv

0+阅读 · 2022年11月4日

Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning Framework

Arxiv

0+阅读 · 2022年11月3日

Revisiting Language Support for Generic Programming: When Genericity Is a Core Design Goal

Arxiv

0+阅读 · 2022年11月3日

FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows

Arxiv

0+阅读 · 2022年11月3日

Reinforced Negative Sampling over Knowledge Graph for Recommendation

Arxiv

17+阅读 · 2020年3月12日

VIP会员

文章信息

相关主题

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

126页ppt《AI应用（AI Agent）开发新范式》！

基于深度神经网络的视频分析中的效率优化技术综述：处理系统、算法与应用

WWW2025 | KAG：一种大模型知识增强生成框架

用于时间序列预测的扩散模型：综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

Generalized Product-of-Experts for Learning Multimodal Representations in Noisy Environments

Arxiv

0+阅读 · 2022年11月7日

A Synthesis-Based Approach for Thermal-to-Visible Face Verification

Arxiv

0+阅读 · 2022年11月6日

Understanding the properties and limitations of contrastive learning for Out-of-Distribution detection

Arxiv

0+阅读 · 2022年11月6日

Random Test Generation of Application Programming Interfaces

Arxiv

0+阅读 · 2022年11月6日

Development and evaluation of automated localization and reconstruction of all fruits on tomato plants in a greenhouse based on multi-view perception and 3D multi-object tracking

Arxiv

0+阅读 · 2022年11月4日

Modeling and Executing Production Processes with Capabilities and Skills using Ontologies and BPMN

Arxiv

0+阅读 · 2022年11月4日

Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning Framework

Arxiv

0+阅读 · 2022年11月3日

Revisiting Language Support for Generic Programming: When Genericity Is a Core Design Goal

Arxiv

0+阅读 · 2022年11月3日

FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows

Arxiv

0+阅读 · 2022年11月3日

Reinforced Negative Sampling over Knowledge Graph for Recommendation

Arxiv

17+阅读 · 2020年3月12日

相关基金

TRAP1在赭曲霉毒素A干扰肾细胞凋亡与自噬内稳态中的作用机制

国家自然科学基金

0+阅读 · 2014年12月31日

广义线性模型的组变量选择及其在信用评分中的应用

国家自然科学基金

2+阅读 · 2014年12月31日

BER通路基因miRNA结合位点基因多态性与结直肠癌易感性的关联及功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

能量耦合因子转运蛋白结构与功能的研究

国家自然科学基金

0+阅读 · 2013年12月31日

RERT-lncRNA调控EGLN2在肝细胞肝癌发生中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于EEG和fNIRS的多模态脑机接口运动想象参数研究

国家自然科学基金

1+阅读 · 2012年12月31日

Prohibitin调控癌组织内源性雄激素合成促进前列腺癌激素抵抗性进展机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于物理空间与网络空间的社会事件分析的研究

国家自然科学基金

0+阅读 · 2012年12月31日

混合存储结构感知的并行文件系统关键技术

国家自然科学基金

1+阅读 · 2011年12月31日

Pim激酶家族参与Abl介导细胞转化的分子机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员