Generative models of code, pretrained on large corpora of programs, have shown great success in translating natural language to code (Chen et al., 2021; Austin et al., 2021; Li et al., 2022, inter alia). While these models do not explicitly incorporate program semantics (i.e., execution results) during training, they are able to generate correct solutions for many problems. However, choosing a single correct program from among a generated set for each problem remains challenging. In this work, we introduce execution result--based minimum Bayes risk decoding (MBR-EXEC) for program selection and show that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks. We select output programs from a generated candidate set by marginalizing over program implementations that share the same semantics. Because exact equivalence is intractable, we execute each program on a small number of test inputs to approximate semantic equivalence. Across datasets, execution or simulated execution significantly outperforms the methods that do not involve program semantics. We find that MBR-EXEC consistently improves over all execution-unaware selection methods, suggesting it as an effective approach for natural language to code translation.
翻译:在大型程序组合上预先培训的代码生成模型,在将自然语言转换成代码(Chen等人,2021年;Austin等人,2021年;Li等人,2022年等,等等)方面显示出在将自然语言转换为代码(Chen等人,2021年;Austin等人,2021年;Li等人,2022年等)方面的巨大成功。虽然这些模型没有在培训期间明确纳入程序语义学(即执行结果),但是它们能够产生对许多问题的正确解决办法。然而,从为每个问题生成的一组数据集中选择一个单一的正确程序仍然具有挑战性。在这项工作中,我们采用基于结果的最低巴耶斯风险解码(MBR-EX风险解码(MBR-EX EX ) 方法来选择程序,显示在自然语言对代码学上的持续改进,作为自然翻译方法,我们发现MBBR-EX EX 选择方法可以不断改进。