Reasoning over natural language is a long-standing goal for the research community. However, studies have shown that existing language models are inadequate in reasoning. To address the issue, we present POET, a new pre-training paradigm. Through pre-training language models with programs and their execution results, POET empowers language models to harvest the reasoning knowledge possessed in program executors via a data-driven approach. POET is conceptually simple and can be instantiated by different kinds of programs. In this paper, we show three empirically powerful instances, i.e., POET-Math, POET-Logic, and POET-SQL. Experimental results on six benchmarks demonstrate that POET can significantly boost model performance on natural language reasoning, such as numerical reasoning, logical reasoning, and multi-hop reasoning. Taking the DROP benchmark as a representative example, POET improves the F1 metric of BART from 69.2% to 80.6%. Furthermore, POET shines in giant language models, pushing the F1 metric of T5-11B to 87.6% and achieving a new state-of-the-art performance on DROP. POET opens a new gate on reasoning-enhancement pre-training and we hope our analysis would shed light on the future research of reasoning like program executors.
翻译:自然语言是研究界的长期目标。然而,研究显示,现有的语言模式在推理上是不充分的。为了解决这一问题,我们提出了新的培训前模式POET。通过培训前语言模式,包括程序及其执行结果,POET赋予语言模式以权力,通过数据驱动的方法,获取程序执行者掌握的推理知识。POET在概念上是简单的,可以由不同种类的方案进行回馈。此外,在本文中,我们展示了三个经验强大的实例,即POET-Math、POET-Logic和POET-SQL。 六个基准的实验结果表明,POET能够大大提升自然语言推理学的模型性能,如数字推理学、逻辑推理学和多动推理。以DROP基准为代表,将BART的F1衡量标准从69.2%提高到80.6%。此外,POET以巨大的语言模型为闪亮,将F1的T5-11B指标推向87.6%,并实现我们未来研发前的新的州-门推理化方案。