Collaborative software development is an integral part of the modern software development life cycle, essential to the success of large-scale software projects. When multiple developers make concurrent changes around the same lines of code, a merge conflict may occur. Such conflicts stall pull requests and continuous integration pipelines for hours to several days, seriously hurting developer productivity. In this paper, we introduce MergeBERT, a novel neural program merge framework based on the token-level three-way differencing and a transformer encoder model. Exploiting restricted nature of merge conflict resolutions, we reformulate the task of generating the resolution sequence as a classification task over a set of primitive merge patterns extracted from real-world merge commit data. Our model achieves 64--69% precision of merge resolution synthesis, yielding nearly a 2x performance improvement over existing structured and neural program merge tools. Finally, we demonstrate versatility of our model, which is able to perform program merge in a multilingual setting with Java, JavaScript, TypeScript, and C# programming languages, generalizing zero-shot to unseen languages.
翻译:合作软件开发是现代软件开发生命周期不可分割的一部分,对于大规模软件项目的成功至关重要。 当多个开发者同时对代码线同时进行修改时, 可能会发生合并冲突。 这种冲突将拉动请求和连续整合管道拖延数小时至数日, 严重损害开发者的生产力。 在本文中, 我们引入了基于象征性三向差异的新颖神经程序合并框架和变压器编码模型。 探索合并冲突解决方案的有限性质, 我们重新配置生成解析序列的任务, 将其作为一套原始合并模式的分类任务, 从真实世界合并中提取的数据。 我们的模型实现了合并解析合成的64%至69%的精确度, 使现有结构化和神经程序合并工具的性能得到近2x的改进。 最后, 我们展示了我们的模型的多功能性, 它能够在多语种环境下与爪哇、 JavaScript、 TyScript、 C# 编程语言进行合并, 将零光化为看不见语言。