Pretrained language models have been shown to be effective in many software-related generation tasks; however, they are not well-suited for editing tasks as they are not designed to reason about edits. To address this, we propose a novel pretraining objective which explicitly models edits and use it to build CoditT5, a large language model for software-related editing tasks that is pretrained on large amounts of source code and natural language comments. We fine-tune it on various downstream editing tasks, including comment updating, bug fixing, and automated code review. By outperforming standard generation-based models, we demonstrate the generalizability of our approach and its suitability for editing tasks. We also show how a standard generation model and our edit-based model can complement one another through simple reranking strategies, with which we achieve state-of-the-art performance for the three downstream editing tasks.
翻译:培训前语言模型在许多与软件有关的生成任务中证明是有效的;然而,它们不适合编辑任务,因为其设计不适于编辑。为此,我们提出一个新的培训前目标,明确模式编辑和使用它来建设CoditT5, 一种软件相关编辑任务的大型语言模型,在大量源代码和自然语言评论方面经过预先培训。我们将其微调到各种下游编辑任务上,包括评论更新、纠正错误和自动代码审查。我们通过业绩优于标准的生成模型,展示了我们的方法的通用性及其是否适合编辑任务。我们还展示了标准生成模型和我们基于编辑的模式如何通过简单的重新排序战略相互补充,我们通过这些战略实现三个下游编辑任务的最新性能。