Program merging is ubiquitous in modern software development. Although commonly used in most version control systems, text-based merge algorithms are prone to producing spurious merge conflicts: they report a conflict even when program changes do not interfere with each other semantically. Spurious merge conflicts are costly to development as the need for manual intervention stalls modern continuous integration pipelines. We propose a novel data-driven approach to identify and resolve spurious merge conflicts with a sequence-to-sequence machine learning model. We realize our approach in a tool DeepMerge that uses a novel combination of (i) an edit-aware embedding of merge inputs and (ii) a variation of pointer networks to construct resolutions from input segments. We also propose an algorithm to extract ground truth manual resolutions from a code corpus and employ it to curate a dataset comprising 10,729 non-trivial resolutions in Javascript programs. Our evaluation shows that DeepMerge can predict correct resolutions with high precision ($72$%) and modest recall ($34$%) on the dataset overall, and high recall ($78$%) on merges comprising of upto 3 lines that comprise $24$% of the dataset.
翻译:在现代软件开发中,程序合并是司空见惯的。尽管大多数版本控制系统通常使用基于文本的合并算法,但这种算法很容易产生虚假的合并冲突:它们报告冲突,即使程序改变并不相互干扰。净合并冲突对于发展来说代价很高,因为人工干预的需要使现代连续整合管道处于停顿状态。我们建议采用新的数据驱动方法,用一个序列到序列机器学习模式来识别和解决虚假合并冲突。我们意识到我们的方法是在一个工具“深网”中使用了一种新颖的组合:(一) 合并输入的编辑-认知嵌入,以及(二) 用于从输入部分构建分辨率的指向网络的变换。我们还提议了一个算法,从代码中提取地面的真相手册分辨率,并使用它来整理由10,729个非三重分辨率组成的数据集。我们的评估表明,DeepMeorg可以非常精确地预测出整个数据集的正确分辨率(72%)和适度的回顾(34%),以及包含24 %数据的合并至3行的重合数(78%)。