We propose a novel approach to interactive theorem-proving (ITP) using deep reinforcement learning. The proposed framework is able to learn proof search strategies as well as tactic and arguments prediction in an end-to-end manner. We formulate the process of ITP as a Markov decision process (MDP) in which each state represents a set of potential derivation paths. This structure allows us to introduce a novel backtracking mechanism which enables the agent to efficiently discard (predicted) dead-end derivations and restart from promising alternatives. We implement the framework in the HOL4 theorem prover. Experimental results show that the framework outperforms existing automated theorem provers (i.e., hammers) available in HOL4 when evaluated on unseen problems. We further elaborate the role of key components of the framework using ablation studies.
翻译:我们建议采用新的方法,利用深层强化学习来进行互动理论验证(ITP),拟议的框架能够学习证据搜索战略以及战术和理论预测,我们把IPP进程设计成一个马可夫决策程序(MDP),其中每个国家代表一系列潜在的衍生途径。这一结构使我们能够引入一个新的回溯跟踪机制,使代理商能够有效地丢弃(预测的)死端产物,并重新启动有希望的替代品。我们在HOL4理论验证器中实施该框架。实验结果显示,该框架在评估隐蔽问题时,超过了HOL4中现有的自动理论验证器(即锤子)。我们通过减缩研究进一步阐述了框架关键组成部分的作用。