Theorem proving in natural mathematical language - the mixture of symbolic and natural language used by humans - plays a central role in mathematical advances and education, and tests aspects of reasoning that are core to intelligence. Yet it has remained underexplored with modern generative models. We study large-scale language models on two new generation tasks: suggesting the next step in a mathematical proof, and full proof generation. We develop NaturalProver, a language model that generates proofs by conditioning on background references (e.g. theorems and definitions that are either retrieved or human-provided), and optionally enforces their presence with constrained decoding. On theorems from the NaturalProofs benchmark, NaturalProver improves the quality of next-step suggestions and generated proofs over fine-tuned GPT-3, according to human evaluations from university-level mathematics students. NaturalProver is capable of proving some theorems that require short (2-6 step) proofs, and providing next-step suggestions that are rated as correct and useful over 40% of the time, which is to our knowledge the first demonstration of these capabilities using neural language models.
翻译:以自然数学语言 — — 人类使用的象征语言和自然语言混合在一起 — — 证明理论的自然理论在数学进步和教育中发挥着核心作用,并测试了作为智力核心的推理的各个方面。然而,它仍然没有得到现代基因模型的充分利用。我们研究了关于两种新一代任务的大规模语言模型:建议数学证据的下一个步骤,以及完整的证据生成。我们开发了自然模型,一种语言模型,以背景参考(例如被检索或由人提供的理论和定义)为条件,产生证据,并且以有限的解码来强制实施其存在。在自然Proofs基准的理论中,自然Prover改进了下一步建议的质量,并根据大学一级数学学生的人类评估,为经过精细调整的GPT-3提供了证据。自然模型能够证明某些需要短时间(2-6步)证据的标语,并提供在40%以上时间里被评为正确和有用的下一步建议,这是我们对使用神经语言模型的首次展示这些能力。