Thor:用断断锤整合语言模型和自动理论验证 (Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers)

In theorem proving, the task of selecting useful premises from a large library to unlock the proof of a given conjecture is crucially important. This presents a challenge for all theorem provers, especially the ones based on language models, due to their relative inability to reason over huge volumes of premises in text form. This paper introduces Thor, a framework integrating language models and automated theorem provers to overcome this difficulty. In Thor, a class of methods called hammers that leverage the power of automated theorem provers are used for premise selection, while all other tasks are designated to language models. Thor increases a language model's success rate on the PISA dataset from $39\%$ to $57\%$, while solving $8.2\%$ of problems neither language models nor automated theorem provers are able to solve on their own. Furthermore, with a significantly smaller computational budget, Thor can achieve a success rate on the MiniF2F dataset that is on par with the best existing methods. Thor can be instantiated for the majority of popular interactive theorem provers via a straightforward protocol we provide.

翻译：从一个大型图书馆挑选有用的馆舍,以解开某一假设的证明,这是至关重要的。这对所有理论验证者,特别是基于语言模型的验证者,提出了挑战,因为相对而言,他们无法对大量文本形式的馆舍进行合理解释。本文介绍了索尔(Thor),这是一个整合语言模型和自动理论验证器的框架,以克服这一困难。在索尔(Thor),一种称为锤子的方法,利用自动理论验证器的力量进行前置选择,而所有其他任务则指定给语言模型。托尔(Thor)将PISA数据集中的语言模型成功率从39美元增加到57美元,同时解决语言模型和自动理论验证器无法自行解决问题的8.2美元问题。此外,由于计算预算要少得多,索尔可以在与现有最佳方法相匹配的小型F2F数据集上实现成功率。索尔(Thor)可以通过我们提供的一个直接的协议,为大多数流行的互动理论验证者提供即时。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日