Although scaling up language model size has reliably improved performance on a range of NLP tasks, even the largest models currently struggle with certain reasoning tasks such as math word problems, symbolic manipulation, and commonsense reasoning. This paper explores the ability of language models to generate a coherent chain of thought -- a series of short sentences that mimic the reasoning process a person might have when responding to a question. Experiments show that inducing a chain of thought via prompting can enable sufficiently large language models to better perform reasoning tasks that otherwise have flat scaling curves.
翻译:虽然扩大语言模式的规模可靠地改善了一系列国家语言方案任务的业绩,但即使是目前最大的模式也与某些推理任务挣扎,如数学字数问题、象征性操纵和常识推理。本文件探讨了语言模式是否有能力形成一个连贯的思维链 -- -- 一系列短句,模仿一个人在回答一个问题时可能具有的推理过程。实验表明,通过催促引发一系列思考,能够使足够大的语言模式更好地完成本来具有平坦缩放曲线的推理任务。