通过半结构数学理由的政策梯度动态快速学习 (Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning)

Mathematical reasoning, a core ability of human intelligence, presents unique challenges for machines in abstract thinking and logical reasoning. Recent large pre-trained language models such as GPT-3 have achieved remarkable progress on mathematical reasoning tasks written in text form, such as math word problems (MWP). However, it is unknown if the models can handle more complex problems that involve math reasoning over heterogeneous information, such as tabular data. To fill the gap, we present Tabular Math Word Problems (TabMWP), a new dataset containing 38,431 open-domain grade-level problems that require mathematical reasoning on both textual and tabular data. Each question in TabMWP is aligned with a tabular context, which is presented as an image, semi-structured text, and a structured table. There are two types of questions: free-text and multi-choice, and each problem is annotated with gold solutions to reveal the multi-step reasoning process. We evaluate different pre-trained models on TabMWP, including the GPT-3 model in a few-shot setting. As earlier studies suggest, since few-shot GPT-3 relies on the selection of in-context examples, its performance is unstable and can degrade to near chance. The unstable issue is more severe when handling complex problems like TabMWP. To mitigate this, we further propose a novel approach, PromptPG, which utilizes policy gradient to learn to select in-context examples from a small amount of training data and then constructs the corresponding prompt for the test example. Experimental results show that our method outperforms the best baseline by 5.31% on the accuracy metric and reduces the prediction variance significantly compared to random selection, which verifies its effectiveness in the selection of in-context examples.

翻译：数学推理是人类智能的核心能力,它给机器抽象思维和逻辑推理方面提出了独特的挑战。最近,诸如GPT-3(GPT-3)等经过预先训练的大型语言模型在以文字形式编写的数学推理任务方面取得了显著的进展,如数学词词问题(MWP)。然而,尚不清楚这些模型能否处理更复杂的问题,这些问题涉及对多种信息(如表格数据)的数学推理。为了填补空白,我们提出了TabMWP(TabMWP)的新数据集(TabMWP),包含38 431个开放式品级级级问题,这需要用数学推理来解释文本和表格数据。TabMWP(TabMW)中的每个问题都与表格的准确性环境一致。TabMWP(TPT)中的每一个问题都与表格的准确性相关,以图像、半结构化文本和结构化表格的形式出现两种问题:自由文本和多曲调,每个问题都有黄金的解说来揭示多步推理过程过程。我们随后对TabM-3(GPTPTPTPTF-3)的模型模型模型进行不同的分析。根据最早期的精确的推理判判判判判判问题,从接近于对准的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精细的精确度选择。