Recent literature has shown that large language models (LLMs) are generally excellent few-shot reasoners to solve text reasoning tasks. However, the capability of LLMs on table reasoning tasks is yet to be explored. In this paper, we aim at understanding how well LLMs can perform on these table tasks with few-shot in-context learning. Specifically, we evaluate LLMs on popular table QA and fact verification datasets like WikiTableQuestion, FetaQA, TabFact, and FEVEROUS and found that LLMs are really competent at complex reasoning over table structures. When combined with `chain of thoughts' prompting, GPT-3 is able to achieve very strong performance with only a 1-shot demonstration. We further manually study the reasoning chains elicited from LLMs and found that these reasoning chains are highly consistent with the `ground truth' semantic form. We believe that our study opens new possibilities to employ LLMs on different table-based reasoning tasks under few-shot scenario.
翻译:最近的文献表明,大型语言模型(LLMS)一般是解决文本推理任务的极好少见的推理人,然而,LLMS在表格推理任务方面的能力还有待于探索,在本文件中,我们的目标是了解LLMS在表格任务上能够以几分文字内学习的方式如何很好地完成这些任务,具体地说,我们评估了大众桌面QA上的LLMS和诸如Wiki TableQQFool、FetaQA、TabFact和FEveroous等事实验证数据集,发现LLMS在表格结构上的复杂推理上非常称职。在与“思维链”的激发下,GPT-3能够取得非常出色的表现,只有一分镜头的演示。我们进一步手工研究LMS的推理链,发现这些推理链与“地面真相”的语义形式非常一致。我们认为,我们的研究为在少数情景下将LMSMs用于不同的基于表格的推理学任务开辟了新的可能性。