Code is seldom written in a single left-to-right pass and is instead repeatedly edited and refined. We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via infilling). InCoder is trained to generate code files from a large corpus of permissively licensed code, where regions of code have been randomly masked and moved to the end of each file, allowing code infilling with bidirectional context. Our model is the first generative model that is able to directly perform zero-shot code infilling, which we evaluate on challenging tasks such as type inference, comment generation, and variable re-naming. We find that the ability to condition on bidirectional context substantially improves performance on these tasks, while still performing comparably on standard program synthesis benchmarks in comparison to left-to-right only models pretrained at similar scale. The InCoder models and code are publicly released. https://sites.google.com/view/incoder-code-models
翻译:代码很少以单左对右通过写成,而是反复编辑和完善。我们引入了 InCoder,这是一个统一的基因模型,可以进行程序合成(通过左对右一代)和编辑(填充)和编辑(通过填充)。 InCoder 受过培训,可以从大量许可使用的代码中生成代码文件,其中代码区域被随机遮盖,并移动到每个文件的末尾,允许代码填充双向环境。我们的模型是第一个能够直接执行零射代码填充的基因模型,我们评估了具有挑战性的任务,例如类型推断、评论生成和变量重命名。我们发现,以双向环境为条件的能力大大改进了这些任务的业绩,同时仍然在标准程序合成基准上与在类似规模上预先训练的仅左向右模式相对。 InCoder 模型和代码被公开发布。 https://sites.google.com/view/incoder-coded-models。