Table entailment, the binary classification task of finding if a sentence is supported or refuted by the content of a table, requires parsing language and table structure as well as numerical and discrete reasoning. While there is extensive work on textual entailment, table entailment is less well studied. We adapt TAPAS (Herzig et al., 2020), a table-based BERT model, to recognize entailment. Motivated by the benefits of data augmentation, we create a balanced dataset of millions of automatically created training examples which are learned in an intermediate step prior to fine-tuning. This new data is not only useful for table entailment, but also for SQA (Iyyer et al., 2017), a sequential table QA task. To be able to use long examples as input of BERT models, we evaluate table pruning techniques as a pre-processing step to drastically improve the training and prediction efficiency at a moderate drop in accuracy. The different methods set the new state-of-the-art on the TabFact (Chen et al., 2020) and SQA datasets.
翻译:表格包含的二进制分类任务,即查找某一句是否得到表格内容的支持或反驳,要求对语言和表格结构以及数字和离散推理进行区分。虽然在文本包含方面做了大量工作,但表格包含的内容研究较少。我们调整了基于表格的BERT模型TAPAS(Herzig等人,2020年),以确认包含的内容。我们受数据增强的好处的驱动,创建了数百万自动创建的培训范例的平衡数据集,这些范例在微调前的中间步骤中学习。这种新数据不仅对表格包含有用,而且对SQA(Iyyer等人,2017年),一个顺序的表格QA任务也有用。为了能够将长的示例用作BERT模型的输入,我们将表格的剪裁技术作为处理前步骤来评估,以便在适度的精度下降时大幅度改进培训和预测效率。不同的方法在TabFact(Chen等人,2020年)和SQA数据集上设置了新的状态。