We develop an approach for improving the trustworthiness and overall accuracy of program synthesizers based on large language models for source code. Given a natural language description of a programming problem, our method samples both candidate programs as well as candidate predicates specifying how the program should behave. We learn to analyze the agreement between programs and predicates to judge both which program is most likely to be correct, and also judge whether the language model is able to solve the programming problem in the first place. This latter capacity allows favoring high precision over broad recall: fostering trust by only proposing a program when the system is certain that it is correct.
翻译:我们根据源代码的大型语言模型,制定了提高程序合成器的可信赖性和总体准确性的方法。根据对程序拟定问题的自然语言描述,我们的方法对候选程序以及候选方案的前提进行抽样,具体说明程序应如何运作。我们学会分析程序与前提之间的协议,以判断哪个程序最有可能正确,并判断语言模式是否能够首先解决程序拟定问题。后一种能力有利于高精度而非广泛回顾:在系统确定正确时,仅提出程序,从而增进信任。