Code is seldom written in a single left-to-right pass and is instead repeatedly edited and refined. We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via infilling). InCoder is trained to generate code files from a large corpus of permissively licensed code, where regions of code have been randomly masked and moved to the end of each file, allowing code infilling with bidirectional context. Our model is the first generative model that is able to directly perform zero-shot code infilling, which we evaluate on challenging tasks such as type inference, comment generation, and variable re-naming. We find that the ability to condition on bidirectional context substantially improves performance on these tasks, while still performing comparably on standard program synthesis benchmarks in comparison to left-to-right only models pretrained at similar scale. The InCoder models and code are publicly released. https://sites.google.com/view/incoder-code-models
翻译:代码往往不是在一次从左到右的过程中编写完成的,而是经过反复的编辑和精细修改。我们介绍了InCoder,一种统一的生成模型,可以执行程序合成(通过从左到右的生成),也可以进行编辑(通过填充)。 InCoder的训练目的是从大量的许可的代码语料库中生成代码文件,在其中对代码区域进行随机掩码并将其移动到文件的末尾,允许在交叉上下文中进行代码填充。我们的模型是第一个能够直接进行零-shot代码填充的生成模型,我们将其评估任务的难度,如类型推断、注释生成和变量重命名。我们发现,条件双向文本匹配能够显着提高这些任务的性能,同时与类似规模的左到右预训练模型在标准程序合成基准上表现相当。InCoder模型和代码已经公开发布。https://sites.google.com/view/incoder-code-models