Answering natural language questions over tables is usually seen as a semantic parsing task. To alleviate the collection cost of full logical forms, one popular approach focuses on weak supervision consisting of denotations instead of logical forms. However, training semantic parsers from weak supervision poses difficulties, and in addition, the generated logical forms are only used as an intermediate step prior to retrieving the denotation. In this paper, we present TAPAS, an approach to question answering over tables without generating logical forms. TAPAS trains from weak supervision, and predicts the denotation by selecting table cells and optionally applying a corresponding aggregation operator to such selection. TAPAS extends BERT's architecture to encode tables as input, initializes from an effective joint pre-training of text segments and tables crawled from Wikipedia, and is trained end-to-end. We experiment with three different semantic parsing datasets, and find that TAPAS outperforms or rivals semantic parsing models by improving state-of-the-art accuracy on SQA from 55.1 to 67.2 and performing on par with the state-of-the-art on WIKISQL and WIKITQ, but with a simpler model architecture. We additionally find that transfer learning, which is trivial in our setting, from WIKISQL to WIKITQ, yields 48.7 accuracy, 4.2 points above the state-of-the-art.
翻译:回答表格中的自然语言问题通常被视为一种语义分解任务。为了降低完整逻辑表格的收集成本,一种流行的方法侧重于由批注而不是逻辑表格组成的薄弱监督。然而,对来自薄弱监督的语义分析师的培训带来了困难,此外,生成的逻辑表格只是用作检索分解之前的一个中间步骤。在本文中,我们介绍TAPAS,一种在不产生逻辑表格的情况下对表格的回答进行问答的方法。TAPAS从薄弱的监管中培训,通过选择表格单元格和选择相应的聚合操作员来预测分解。TAAPAS将BER的架构扩展至将表格编码为输入,从一个从维基百科获取的文本段和表格的有效联合培训开始,并经过培训的端对端。我们用三种不同的语义分解数据集进行实验,发现TAPAS从一个不完善或对立的语义格式分解模式,从55.1至67.2,在SQA上选择一个相应的聚合操作员。TERT的架构将表格编码编码编码为输入表格,从55.1至48.2,与S-WI的S-MIFI学习中,在S-IST-I-I-I-I-I-I-S-S-I-I-S-I-S-I-I-I-I-I-I-I-I-I-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S