In this paper, we propose a novel technique, namely INVALIDATOR, to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. INVALIDATOR reasons about program semantic via program invariants while it also captures program syntax via language semantic learned from large code corpus using the pre-trained language model. Given a buggy program and the developer-patched program, INVALIDATOR infers likely invariants on both programs. Then, INVALIDATOR determines that a APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains errors behaviors of the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, INVALIDATOR utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of INVALIDATOR is three-fold. First, INVALIDATOR is able to leverage both semantic and syntactic reasoning to enhance its discriminant capability. Second, INVALIDATOR does not require new test cases to be generated but instead only relies on the current test suite and uses invariant inference to generalize the behaviors of a program. Third, INVALIDATOR is fully automated. We have conducted our experiments on a dataset of 885 patches generated on real-world programs in Defects4J. Experiment results show that INVALIDATOR correctly classified 79% overfitting patches, accounting for 23% more overfitting patches being detected by the best baseline. INVALIDATOR also substantially outperforms the best baselines by 14% and 19% in terms of Accuracy and F-Measure, respectively.
翻译:在本文中,我们提出一种新颖的技术,即INVALIDATOR, 通过语义和合成推理自动评估 PRA 生成的补丁的正确性。 INVALIDATOR 通过程序变异性, 同时也通过语言预培训语言模式从大代码中获取语义学。 由于程序错误和开发者施展程序, INVALIDATER 推导两个程序都可能有差异性。 然后, INVALIDATER 确定 RA 生成的补丁的正确性, 如果:(1) 它违反正确的规格或 (2) 维持原始的错误行为 。 INVALIDATER 有关程序通过程序变异性校正的补丁, INVALIDER 使用一个经过训练的模型来评估基于程序语义学的补丁。 INVALIDER 的好处是三倍的。 首先, INVALIDER 能够利用语义和合成推理推理性推理来提高它的反向性能力。 其次, INALIDATOR 不要求新的测试基于异性测试基于变量的补缺缺缺误, IM IM IM IM 数据系统, 。 正在完全地使用我们 IM IM IMI IMLILILILILILILIDIDIDIDI 数据 。