Contextual information plays a vital role for software developers when understanding and fixing a bug. Consequently, deep learning-based program repair techniques leverage context for bug fixes. However, existing techniques treat context in an arbitrary manner, by extracting code in close proximity of the buggy statement within the enclosing file, class, or method, without any analysis to find actual relations with the bug. To reduce noise, they use a predefined maximum limit on the number of tokens to be used as context. We present a program slicing-based approach, in which instead of arbitrarily including code as context, we analyze statements that have a control or data dependency on the buggy statement. We propose a novel concept called dual slicing, which leverages the context of both buggy and fixed versions of the code to capture relevant repair ingredients. We present our technique and tool called Katana, the first to apply slicing-based context for a program repair task. The results show Katana effectively preserves sufficient information for a model to choose contextual information while reducing noise. We compare against four recent state-of-the-art context-aware program repair techniques. Our results show Katana fixes between 1.5 to 3.7 times more bugs than existing techniques.
翻译:在理解和修补错误时,背景信息对软件开发者起着关键作用。 因此, 深层次的基于学习的程序修理技术会影响错误修补环境。 但是, 现有的技术会任意地处理背景, 在附件文件、 类别或方法中, 在不作任何分析的情况下, 提取离错误语句很近的代码, 从而找到与错误的实际关系 。 为了减少噪音, 它们会使用预先定义的最大限值来限制用作背景的标记数量 。 我们提出了一个基于程序剪切除方法, 而不是任意地将代码作为背景, 我们分析对错误语句有控制或数据依赖的语句。 我们提出一个称为双剪切的新概念, 即使用错误和固定版本的代码来捕捉相关的修理成分。 我们展示了我们的技术和工具, 叫做卡塔纳, 首先是在程序修理任务中应用基于剪切语的语系。 结果显示卡塔纳有效地保存了足够的信息, 用于选择背景信息的模式, 同时减少噪音。 我们比较了最近四个州- 艺术背景修复程序的技术, 我们的结果显示的是1.5到3. 7 错误之间。