Generative models of code, pretrained on large corpora of programs, have shown great success in translating natural language to code (Chen et al., 2021; Austin et al., 2021; Li et al., 2022, inter alia). While these models do not explicitly incorporate program semantics (i.e., execution results) during training, they are able to generate correct solutions for many problems. However, choosing a single correct program from a generated set for each problem remains challenging. In this work, we introduce execution result--based minimum Bayes risk decoding (MBR-EXEC) for program selection and show that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks. We select output programs from a generated candidate set by marginalizing over program implementations that share the same semantics. Because exact equivalence is intractable, we execute each program on a small number of test inputs to approximate semantic equivalence. Across datasets, execution or simulated execution significantly outperforms the methods that do not involve program semantics. We find that MBR-EXEC consistently improves over all execution-unaware selection methods, suggesting it as an effective approach for natural language to code translation. We open-source our code at github.com/facebookresearch/mbr-exec and data at dl.fbaipublicfiles.com/mbr-exec/mbr-exec-release.zip
翻译:在大型程序团体上预先培训的代码生成模型在将自然语言转换成代码方面表现出巨大的成功(Chen等人,2021年;Austin等人,2021年;Li等人,2022年等),这些模型虽然没有在培训过程中明确纳入程序语义学(即执行结果),但它们能够为许多问题产生正确的解决方案。然而,从为每个问题生成的成套工具中选择一个单一的正确程序仍然具有挑战性。在这项工作中,我们引入基于结果的最低限度巴耶斯风险解码(MBR-EXEC)用于方案选择,并表明它提高了在自然语言对代码任务方面受过训练的代码模型的几分量性性性能。我们从生成的候选人中选择了输出程序,因为对具有相同语义学特征的方案实施过程的边缘化。由于精确的等同性,我们执行每个程序都用少量测试输入来大致的语义等同性。在数据集、执行或模拟执行中,大大超越了不涉及程序语义学的方法。我们发现,MBR-EX公司在公开版本中不断改进我们的代码/内部选择方法。