The existing deep learning (DL)-based automated program repair (APR) models are limited in fixing general software defects. % We present {\tool}, a DL-based approach that supports fixing for the general bugs that require dependent changes at once to one or multiple consecutive statements in one or multiple hunks of code. % We first design a novel fault localization (FL) technique for multi-hunk, multi-statement fixes that combines traditional spectrum-based (SB) FL with deep learning and data-flow analysis. It takes the buggy statements returned by the SBFL model, detects the buggy hunks to be fixed at once, and expands a buggy statement $s$ in a hunk to include other suspicious statements around $s$. We design a two-tier, tree-based LSTM model that incorporates cycle training and uses a divide-and-conquer strategy to learn proper code transformations for fixing multiple statements in the suitable fixing context consisting of surrounding subtrees. We conducted several experiments to evaluate {\tool} on three datasets: Defects4J (395 bugs), BigFix (+26k bugs), and CPatMiner (+44k bugs). On Defects4J dataset, {\tool} outperforms the baselines from 42\%--683\% in terms of the number of auto-fixed bugs with only the top-1 patches. On BigFix dataset, it fixes 31--145 more bugs than existing DL-based APR models with the top-1 patches. On CPatMiner dataset, among 667 fixed bugs, there are 169 (25.3\%) multi-hunk/multi-statement bugs. {\tool} fixes 71 and 164 more bugs, including 52 and 61 more multi-hunk/multi-statement bugs, than the state-of-the-art, DL-based APR models.
翻译:现有的深学习( DL) 自动程序修理( APR) 模型在修补普通软件缺陷方面有限 。% 我们展示了 & Tool}, 这是一种基于 DL 的模型, 支持修补需要同时对一个或多个代码组连续一次对一个或多个代码进行一个或多个报表进行依赖性的更改的一般错误。% 我们首先设计了一个新的错误本地化( FL) 技术, 将传统的频谱基础( SB) FL 与深学习和数据流分析相结合。 它采用 SBL 模型返回的错误声明, 检测错误的 Hunks 将修复一次, 将错误声明扩展成一个 Hunk 中包含一个或多个连续对一个或多个代码的报表。 我们设计了一个双层的基于树的 LSTM 模型, 该模型包含循环培训, 并使用分解和解策略来学习正确的代码转换, 仅由本地端的 Rickr 。 我们用三个数据集组( Defortals) 中包含 4J- filtal- forstal 4 (39 bad) 和 Onal lady) lad- filts 。