Sampling diverse programs from a code language model and reranking with model likelihood is a popular method for code generation but it is prone to preferring degenerate solutions. Inspired by collaborative programming, we propose Coder-Reviewer reranking. We augment Coder language models from past work, which generate programs given language instructions, with Reviewer models, which evaluate the likelihood of the instruction given the generated programs. We perform an extensive study across six datasets with eight models from three model families. Experimental results show that Coder-Reviewer reranking leads to consistent and significant improvement (up to 17% absolute accuracy gain) over reranking with the Coder model only. When combined with executability filtering, Coder-Reviewer reranking can often outperform the minimum Bayes risk method. Coder-Reviewer reranking is easy to implement by prompting, can generalize to different programming languages, and works well with off-the-shelf hyperparameters.
翻译:从代码语言模型中抽取不同程序,并重新排列模式可能性,这是生成代码的一种流行方法,但容易偏好退化的解决方案。在协作编程的启发下,我们提议重新排序。我们从以往工作中增加代码语言模型,产生给定语言指令的程序,使用审评模型,评估教学给生成程序的可能性。我们用三个模式家庭的八个模型对六个数据集进行了广泛研究。实验结果显示,代码审评员重新排序导致与代码模型重新排序相比的一致和显著的改进(高达17%绝对精度增益 ) 。在与可执行性过滤相结合时,代码审评员重新排序往往会超过最低限度的巴伊斯风险方法。代码审评员重新排序很容易通过提示来实施,可以概括到不同的编程语言,并且与现成的超参数合作良好。