皮诺:学习正规数学原因 (Peano: Learning Formal Mathematical Reasoning)

General mathematical reasoning is computationally undecidable, but humans routinely solve new problems. Moreover, discoveries developed over centuries are taught to subsequent generations quickly. What structure enables this, and how might that inform automated mathematical reasoning? We posit that central to both puzzles is the structure of procedural abstractions underlying mathematics. We explore this idea in a case study on 5 sections of beginning algebra on the Khan Academy platform. To define a computational foundation, we introduce Peano, a theorem-proving environment where the set of valid actions at any point is finite. We use Peano to formalize introductory algebra problems and axioms, obtaining well-defined search problems. We observe existing reinforcement learning methods for symbolic reasoning to be insufficient to solve harder problems. Adding the ability to induce reusable abstractions ("tactics") from its own solutions allows an agent to make steady progress, solving all problems. Furthermore, these abstractions induce an order to the problems, seen at random during training. The recovered order has significant agreement with the expert-designed Khan Academy curriculum, and second-generation agents trained on the recovered curriculum learn significantly faster. These results illustrate the synergistic role of abstractions and curricula in the cultural transmission of mathematics.

翻译：一般数学推理在计算上是不可分的,但人类通常会解决新的问题。此外,几个世纪以来的发现会很快地教给后代。什么结构能让这些发现成为自动数学推理的基础,以及这种结果如何能为自动数学推理提供依据?我们假设两个谜题的核心是数学背后的程序抽象结构。我们在对汗学院平台上开始代数的5个部分的案例研究中探索了这个想法。为了定义计算基础,我们引入了Peano,这是一个理论测试环境,在这个环境中,有效的行动在任何时刻都是有限的。我们利用Peano将入门代数问题和轴承问题正规化,并获得定义明确的搜索问题。我们观察现有的强化学习方法,象征性推理方法不足以解决更棘手的问题。增加从其自身解决方案中产生可再使用的抽象(“实践”)的能力,使代理人能够取得稳步的进展,解决所有问题。此外,这些抽象学让在培训期间随机看到的问题产生秩序。恢复的秩序与专家设计的Khan学院课程和在恢复的数学课程中受过训练的第二代代理人有着重要的协议。这些结果说明了抽象课程的遗传性作用。