Recent literature has shown that large language models (LLMs) are generally excellent few-shot reasoners to solve text reasoning tasks. However, the capability of LLMs on table reasoning tasks is yet to be explored. In this paper, we aim at understanding how well LLMs can perform table-related tasks with few-shot in-context learning. Specifically, we evaluated LLMs on popular table QA and fact verification datasets like WikiTableQuestion, FetaQA, TabFact, and FEVEROUS and found that LLMs are competent at complex reasoning over table structures, though these models are not pre-trained on any table corpus. When combined with `chain of thoughts' prompting, LLMs can achieve very strong performance with only a 1-shot demonstration, even on par with some SoTA models. We show that LLMs are even more competent at generating comprehensive long-form answers on FetaQA than tuned T5-large. We further manually studied the reasoning chains elicited from LLMs and found that these reasoning chains are highly consistent with the underlying semantic form. We believe that LLMs can serve as a simple yet generic baseline for future research. The code and data are released in https://github.com/wenhuchen/TableCoT.
翻译:最近的文献表明,大型语言模型(LLMS)一般是解决文本推理任务的极好、少见的推理推理者,然而,LLMS在表格推理任务方面的能力仍有待于探索;在本文件中,我们的目标是了解LLMS在通过几发文字内学习能够完成与表格有关的任务方面有多好;具体地说,我们评估了大众桌面QA上的LLMS和诸如WikiTableQQA Report、FetaQA、TabFact和FEveroous等事实核查数据集等事实验证数据集,发现LLMS有能力应对表格结构的复杂推理,尽管这些模型没有在任何表格中预先训练。在与“思维链”的提示一起进行提示时,LLMS能够以一发式演示,甚至在某些SoTA模型上也能完成非常出色的工作。我们发现LMSMs比调制T5-GA更有能力生成全面的长式答案。我们进一步人工研究了LMSLMS的推理链,发现这些推理链与基本的语义表形式高度一致。我们认为,LMSMSMSDMs可以作为简单的通用基准/CWefentrmusion/Calmus/s。