In collaborative software development, program merging is the mechanism to integrate changes from multiple programmers. Merge algorithms in modern version control systems report a conflict when changes interfere textually. Merge conflicts require manual intervention and frequently stall modern continuous integration pipelines. Prior work found that, although costly, a large majority of resolutions involve re-arranging text without writing any new code. Inspired by this observation we propose the first data-driven approach to resolve merge conflicts with a machine learning model. We realize our approach in a tool DeepMerge that uses a novel combination of (i) an edit-aware embedding of merge inputs and (ii) a variation of pointer networks, to construct resolutions from input segments. We also propose an algorithm to localize manual resolutions in a resolved file and employ it to curate a ground-truth dataset comprising 8,719 non-trivial resolutions in JavaScript programs. Our evaluation shows that, on a held out test set, DeepMerge can predict correct resolutions for 37% of non-trivial merges, compared to only 4% by a state-of-the-art semistructured merge technique. Furthermore, on the subset of merges with upto 3 lines (comprising 24% of the total dataset), DeepMerge can predict correct resolutions with 78% accuracy.
翻译:在合作软件开发中,程序合并是整合多个程序员变化的机制。 现代版本控制系统中的合并算法在修改干扰文本时报告冲突。 合并冲突需要人工干预, 并经常拖延现代连续整合管道。 先前的工作发现, 大部分决议虽然费用高昂, 却涉及重新排列文本而不写入任何新代码。 基于此观察, 我们提出了第一个数据驱动方法, 以解决与机器学习模式的合并冲突。 我们在一个工具DeepMeorge中实现了我们的方法, 使用一种新颖的组合:(一) 合并投入的编辑- 认知嵌入和(二) 点网络的变异, 以构建输入部分的决议。 我们还提出了将人工解决方案配置在已解决的文档中, 并使用它来整理由8, 719项非三重分辨率组成的地面图谱数据集。 我们的评估显示, 在一次测试中, DeepMeorge可以预测37%的非三重合并的分辨率, 相比之下只有4%的状态半结构化网络, 从输入部分构建决议的完全的准确性将78项数据整合到第3项。