In this work we study how to learn good algorithms for selecting reasoning steps in theorem proving. We explore this in the connection tableau calculus implemented by leanCoP where the partial tableau provides a clean and compact notion of a state to which a limited number of inferences can be applied. We start by incorporating a state-of-the-art learning algorithm -- a graph neural network (GNN) -- into the plCoP theorem prover. Then we use it to observe the system's behaviour in a reinforcement learning setting, i.e., when learning inference guidance from successful Monte-Carlo tree searches on many problems. Despite its better pattern matching capability, the GNN initially performs worse than a simpler previously used learning algorithm. We observe that the simpler algorithm is less confident, i.e., its recommendations have higher entropy. This leads us to explore how the entropy of the inference selection implemented via the neural network influences the proof search. This is related to research in human decision-making under uncertainty, and in particular the probability matching theory. Our main result shows that a proper entropy regularisation, i.e., training the GNN not to be overconfident, greatly improves plCoP's performance on a large mathematical corpus.
翻译:在这项工作中,我们研究如何学习良好的算法来选择理论证明中的推理步骤。 我们探索了这个方法, 在由LiightCoP执行的连接表表表的微积分中, 部分排列表提供了一种清洁和紧凑的概念, 即可以应用数量有限的推论。 我们从将最先进的学习算法 -- -- 图形神经网络(GNN) -- -- 纳入 PolCoP 理论证明过程开始。 然后我们用它来观察系统在强化学习环境中的行为, 即, 在学习成功蒙特卡洛树搜索许多问题的推论指南时。 尽管它具有更好的模式匹配能力, 但GNN最初的表现比以前使用的较简单的学习算法要差。 我们观察到, 更简单的算法比较不那么, 也就是, 其建议具有更高的灵敏度。 这导致我们探索通过神经网络所执行的推论选择的灵敏度是如何影响证据搜索的。 这与在不确定性下进行人类决策的研究有关, 特别是概率匹配理论有关。 我们的主要结果显示, 正常的运行率将大大超过磁。