Large Language Models (LLMs) have shown remarkable performance in various basic natural language tasks, which raises hopes for achieving Artificial General Intelligence. To better complete complex tasks, we need LLMs to program for the task and then follow the program to generate a specific solution for the test sample. We propose using natural language as a new programming language to describe task procedures, making them easily understandable to both humans and LLMs. LLMs are capable of directly generating natural language programs, but these programs may still contain factual errors or incomplete steps. Therefore, we further propose the Learning to Program (LP) method to ask LLMs themselves to learn natural language programs from the training dataset of complex tasks and then use the learned program to guide inference. Our experiments on the AMPS (high school math) and Math (competition mathematics problems) datasets demonstrate the effectiveness of our approach. When testing ChatGPT on 10 tasks from the AMPS dataset, our LP method's average performance outperformed the direct zero-shot test performance by 18.3$\%$. We release our code at \url{https://github.com/microsoft/NaturalLanguageProgram}.
翻译:大型语言模型(LLM)在各种基本的自然语言任务中表现出了非凡的性能,这引发了实现人工智能通用性的希望。为了更好地完成复杂的任务,我们需要LLM编写任务程序,然后遵循程序为测试样本生成特定的解决方案。我们建议使用自然语言作为一种新的编程语言来描述任务过程,使它们易于被人类和LLM理解。LLM能够直接生成自然语言程序,但是这些程序可能仍然包含错误或不完整的步骤。因此,我们进一步提出“学习编程”(LP)方法,要求LLM从复杂任务的训练数据集中学习自然语言程序,然后使用学习的程序来指导推理。我们在高中数学AMPS和竞赛数学问题Math数据集上的实验证明了我们方法的有效性。当测试ChatGPT在AMPS数据集的10个任务上时,我们的LP方法的平均性能比直接零-shot测试的性能提高了18.3%。我们在\url{https://github.com/microsoft/NaturalLanguageProgram}上发布了我们的代码。