Traditional generative models are limited to predicting sequences of terminal tokens. However, ambiguities in the generation task may lead to incorrect outputs. Towards addressing this, we introduce Grammformers, transformer-based grammar-guided models that learn (without explicit supervision) to generate sketches -- sequences of tokens with holes. Through reinforcement learning, Grammformers learn to introduce holes avoiding the generation of incorrect tokens where there is ambiguity in the target task. We train Grammformers for statement-level source code completion, i.e., the generation of code snippets given an ambiguous user intent, such as a partial code context. We evaluate Grammformers on code completion for C# and Python and show that it generates 10-50% more accurate sketches compared to traditional generative models and 37-50% longer sketches compared to sketch-generating baselines trained with similar techniques.
翻译:传统基因模型仅限于预测终端符号序列。 但是, 生成任务中的模糊性可能导致不正确的输出。 为了解决这个问题, 我们引入了格拉姆弗德、 以变压器为基础的语法指导模型来学习( 没有明确的监督) 来生成草图 -- -- 带有孔的象征序列。 通过强化学习, 格拉姆弗德学会了引入洞洞以避免生成不正确的符号, 目标任务中存在模糊性。 我们培训格拉姆斯特完成语句级源代码, 即生成代码片时用户的意图模糊不清, 例如部分代码背景。 我们评估了C# 和 Python 的代码完成率, 并表明与传统的基因模型相比, 和用类似技术培训的草图生成基线相比, 其生成的精确度要高出10-50%。