Automated Program Repair (APR) techniques have drawn wide attention from both academia and industry. Meanwhile, one main limitation with the current state-of-the-art APR tools is that patches passing all the original tests are not necessarily the correct ones wanted by developers, i.e., the plausible patch problem. To date, various Patch-Correctness Checking (PCC) techniques have been proposed to address this important issue. However, they are only evaluated on very limited datasets as the APR tools used for generating such patches can only explore a small subset of the search space of possible patches, posing serious threats to external validity to existing PCC studies. In this paper, we construct an extensive PCC dataset (the largest manually labeled PCC dataset to our knowledge) to revisit all state-of-the-art PCC techniques. More specifically, our PCC dataset includes 1,988 patches generated from the recent PraPR APR tool, which leverages highly-optimized bytecode-level patch executions and can exhaustively explore all possible plausible patches within its large predefined search space (including well-known fixing patterns from various prior APR tools). Our extensive study of representative PCC techniques on the new dataset has revealed various surprising findings and provided guidelines for future PCC research.
翻译:自动化程序维修技术引起了学术界和工业界的广泛注意。 同时,目前最先进的非洲复兴社会论坛工具的一个主要限制是,通过所有原始测试的补丁不一定是开发者所希望的正确测试,即可信的补丁问题。迄今为止,提出了各种补丁更正检查技术,以解决这一重要问题。然而,这些技术仅根据非常有限的数据集进行评估,因为用于产生这种补丁的非洲复兴社会论坛工具只能探索少量可能的补丁的搜索空间,对现有的PCC研究的外部有效性构成严重威胁。在本文中,我们建造了广泛的PCC数据集(我们知识中最大的人工标注的PCC数据集),以重新审视所有最新的PCC技术。更具体地说,我们的PCC数据集包括最近的PRAPR RA工具产生的1,988个补丁,该工具利用了高度精选的点代码级补丁处决,并且能够详尽地探索其大预定的搜索空间(包括众所周知的PCC的固定模式,以及我们以前各种研究工具中显示的令人吃惊的新的数据)。