In theorem proving, the task of selecting useful premises from a large library to unlock the proof of a given conjecture is crucially important. This presents a challenge for all theorem provers, especially the ones based on language models, due to their relative inability to reason over huge volumes of premises in text form. This paper introduces Thor, a framework integrating language models and automated theorem provers to overcome this difficulty. In Thor, a class of methods called hammers that leverage the power of automated theorem provers are used for premise selection, while all other tasks are designated to language models. Thor increases a language model's success rate on the PISA dataset from $39\%$ to $57\%$, while solving $8.2\%$ of problems neither language models nor automated theorem provers are able to solve on their own. Furthermore, with a significantly smaller computational budget, Thor can achieve a success rate on the MiniF2F dataset that is on par with the best existing methods. Thor can be instantiated for the majority of popular interactive theorem provers via a straightforward protocol we provide.
翻译:从一个大型图书馆挑选有用的馆舍,以解开某一假设的证明,这是至关重要的。这对所有理论验证者,特别是基于语言模型的验证者,提出了挑战,因为相对而言,他们无法对大量文本形式的馆舍进行合理解释。本文介绍了索尔(Thor),这是一个整合语言模型和自动理论验证器的框架,以克服这一困难。在索尔(Thor),一种称为锤子的方法,利用自动理论验证器的力量进行前置选择,而所有其他任务则指定给语言模型。托尔(Thor)将PISA数据集中的语言模型成功率从39美元增加到57美元,同时解决语言模型和自动理论验证器无法自行解决问题的8.2美元问题。此外,由于计算预算要少得多,索尔可以在与现有最佳方法相匹配的小型F2F数据集上实现成功率。索尔(Thor)可以通过我们提供的一个直接的协议,为大多数流行的互动理论验证者提供即时。