The most natural method for evaluating program repair systems is to run them on bug datasets, such as Defects4J. Yet, using this evaluation technique on arbitrary real-world programs requires heavy configuration. In this paper, we propose a purely static method to evaluate the potential of the search space of repair approaches. This new method enables researchers and practitioners to encode the search spaces of repair approaches and select potentially useful ones without struggling with tool configuration and execution. We encode the search spaces by specifying the repair strategies they employ. Next, we use the specifications to check whether past commits lie in repair search spaces. For a repair approach, including many human-written past commits in its search space indicates its potential to generate useful patches. We implement our evaluation method in LighteR. LighteR gets a Git repository and outputs a list of commits whose source code changes lie in repair search spaces. We run LighteR on 55,309 commits from the history of 72 Github repositories with and show that LighteR's precision and recall are 77% and 92%, respectively. Overall, our experiments show that our novel method is both lightweight and effective to study the search space of program repair approaches.
翻译:评估程序修理系统的最自然方法是在错误数据集上运行这些系统,例如Deffects4J。 然而,在任意的现实世界程序上使用这种评价技术需要大量配置。 在本文中,我们提议一种纯粹静态的方法来评估修复方法搜索空间的潜力。 这个新方法使研究人员和从业人员能够将修复方法的搜索空间编码,并选择潜在有用的方法,而不必与工具配置和执行发生困难。 我们通过具体说明其使用的修复策略来对搜索空间进行编码。 其次,我们使用规格来检查过去是否在修复搜索空间中存在谎言。 对于一种修复方法,包括许多人类书写过去承诺在其搜索空间中表明其产生有用补丁的潜力。 我们在 LighteR 中应用了我们的评估方法。 LighteR 获得了一个Git 存储处, 并输出了一份其源代码在搜索空间中发生变化的承诺清单。 我们在55 309 我们运行了LighteR, 以72 Githhub 仓库的历史为基础, 并显示LighteR 的精确度和回顾分别是77%和92%。 总之,我们的实验显示我们的新方法既轻量又有效研究空间修复方法。