Computing the differences between two versions of the same program is an essential task for software development and software evolution research. AST differencing is the most advanced way of doing so, and an active research area. Yet, AST differencing algorithms rely on configuration parameters that may have a strong impact on their effectiveness. In this paper, we present a novel approach named DAT (Diff Auto Tuning) for hyperparameter optimization of AST differencing. We thoroughly state the problem of hyper-configuration for AST differencing. We evaluate our data-driven approach DAT to optimize the edit-scripts generated by the state-of-the-art AST differencing algorithm named GumTree in different scenarios. DAT is able to find a new configuration for GumTree that improves the edit-scripts in 18.7% of the evaluated cases.
翻译:计算同一程序两个版本之间的差异是软件开发和软件进化研究的一项基本任务。 AST 差异是这样做的最先进的方法,也是一个活跃的研究领域。 然而, AST 差异算法依赖于配置参数,这可能对其效果产生强烈的影响。 在本文中,我们提出了一个名为 DAT (Diff Auto Tuning) 的新颖方法,用于对 AST 差异进行超参数优化。 我们详尽地说明 AST 差异的超配置问题。 我们评估了我们的数据驱动方法 DAT, 以优化由最新AST 差异算法生成的编辑描述, 名为 GumTree, 在不同情况下。 DAT 能够为 GumTree 找到一个新的配置, 改进了18.7%被评估案例的编辑描述。