Existing table question answering datasets contain abundant factual questions that primarily evaluate the query and schema comprehension capability of a system, but they fail to include questions that require complex reasoning and integration of information due to the constraint of the associated short-form answers. To address these issues and to demonstrate the full challenge of table question answering, we introduce FeTaQA, a new dataset with 10K Wikipedia-based {table, question, free-form answer, supporting table cells} pairs. FeTaQA yields a more challenging table question answering setting because it requires generating free-form text answers after retrieval, inference, and integration of multiple discontinuous facts from a structured knowledge source. Unlike datasets of generative QA over text in which answers are prevalent with copies of short text spans from the source, answers in our dataset are human-generated explanations involving entities and their high-level relations. We provide two benchmark methods for the proposed task: a pipeline method based on semantic-parsing-based QA systems and an end-to-end method based on large pretrained text generation models, and show that FeTaQA poses a challenge for both methods.
翻译:回答数据集的现有表格问题包含大量的事实问题,这些问题主要评估了一个系统的查询和理解系统的能力,但由于相关短质答复的制约,这些问题没有包括需要复杂推理和信息整合的问题。为了解决这些问题并展示表格问题回答的全部挑战,我们引入了FeTaQA,这是一个10K Wikipedia 的10K 维基百科 {可答、 问答、 自由格式回答、 支持表格单元格}配对的新数据集。 FeTaQA 生成了一个更具挑战性的表格问题解答设置,因为它要求在从结构化知识源检索、推断和综合多种不连续事实后产生自由格式的文本解答。与在文本上常见的基因质变QA数据集不同的是,在文本来自源的短文本复本中,我们的数据集的答案是涉及实体及其高层关系的人为解释。我们为拟议任务提供了两种基准方法:基于语义分法的管道方法,以及基于大型预选文本生成模型的终端至终端方法,并表明FetQAAA对这两种方法都构成挑战。