Humans tame the complexity of mathematical reasoning by developing hierarchies of abstractions. With proper abstractions, solutions to hard problems can be expressed concisely, thus making them more likely to be found. In this paper, we propose Learning Mathematical Abstractions (LEMMA): an algorithm that implements this idea for reinforcement learning agents in mathematical domains. LEMMA augments Expert Iteration with an abstraction step, where solutions found so far are revisited and rewritten in terms of new higher-level actions, which then become available to solve new problems. We evaluate LEMMA on two mathematical reasoning tasks--equation solving and fraction simplification--in a step-by-step fashion. In these two domains, LEMMA improves the ability of an existing agent, both solving more problems and generalizing more effectively to harder problems than those seen during training.
翻译:人类通过发展抽象概念的等级体系来驯服数学推理的复杂性。 有了适当的抽象概念,就能简洁地表达解决棘手问题的方法,从而使这些问题更有可能被找到。 在本文中,我们建议学习数学抽象(LEMMA) : 一种算法来实施这一在数学领域强化学习代理人的理念。 LEMMA 以一个抽象步骤来增加专家迭代,在这个步骤中,迄今找到的解决方案被重新审视和改写为新的更高层次的行动,然后可以用来解决新的问题。 我们用两种数学推理任务(QQUI)和分数简化(pat-product-product-product-product-product-product- )来评估LEMMA 。 在这两个领域, LEMMA 提高了现有代理人的能力,既解决更多问题,又更有效地将问题推广到比培训过程中看到的问题更难的问题。