Recent Language Models (LMs) achieve breakthrough performance in code generation when trained on human-authored problems, even solving some competitive-programming problems. Self-play has proven useful in games such as Go, and thus it is natural to ask whether LMs can generate their own instructive programming problems to improve their performance. We show that it is possible for an LM to synthesize programming problems and solutions, which are filtered for correctness by a Python interpreter. The LM's performance is then seen to improve when it is fine-tuned on its own synthetic problems and verified solutions; thus the model 'improves itself' using the Python interpreter. Problems are specified formally as programming puzzles [Schuster et al., 2021], a code-based problem format where solutions can easily be verified for correctness by execution. In experiments on publicly-available LMs, test accuracy more than doubles. This work demonstrates the potential for code LMs, with an interpreter, to generate instructive problems and improve their own performance.
翻译:最近的语言模型(LMS)在接受关于人为问题的培训,甚至解决一些竞争性方案化问题时,在代码生成方面实现了突破性业绩,甚至解决了某些竞争性方案化问题。自我游戏在像Go这样的游戏中被证明是有用的,因此很自然地会问LMs是否能够产生自己的有启发性的编程问题来提高自己的性能。我们证明LM可以综合编程问题和解决办法,通过Python口译员的正确性能来过滤这些问题和解决办法。当LM的性能被精确地调整到自己的合成问题和经核实的解决办法时,就会被看得到改进;因此,模型“改进本身”使用Python口译员。问题被正式指定为编程拼图[Schuster et al., 2021],一种基于代码的问题格式,很容易通过执行来验证解决办法的正确性能。在对公开使用的LMS的LMs进行实验时,测试精度大于双倍。这项工作展示了代码LMs(配有口译员)产生有启发性的问题并改进自身性能的潜力。