Tables provide valuable knowledge that can be used to verify textual statements. While a number of works have considered table-based fact verification, direct alignments of tabular data with tokens in textual statements are rarely available. Moreover, training a generalized fact verification model requires abundant labeled training data. In this paper, we propose a novel system to address these problems. Inspired by counterfactual causality, our system identifies token-level salience in the statement with probing-based salience estimation. Salience estimation allows enhanced learning of fact verification from two perspectives. From one perspective, our system conducts masked salient token prediction to enhance the model for alignment and reasoning between the table and the statement. From the other perspective, our system applies salience-aware data augmentation to generate a more diverse set of training instances by replacing non-salient terms. Experimental results on TabFact show the effective improvement by the proposed salience-aware learning techniques, leading to the new SOTA performance on the benchmark. Our code is publicly available at https://github.com/luka-group/Salience-aware-Learning .
翻译:表格提供了可用于核实文本声明的宝贵知识。虽然一些工作考虑了基于表格的事实核查,但很少能直接将表格数据与文本声明中的标语直接对齐。此外,培训通用事实核查模型需要大量的标签培训数据。在本文件中,我们提议了一种解决这些问题的新制度。在反事实因果关系的启发下,我们的系统在声明中确定了象征性的显著地位,并基于基于显著的估算。从两个角度看,对荣誉的估算可以加强对事实核查的学习。从一个角度看,我们的系统进行了掩码显著的象征性预测,以加强表格和声明之间的校正和推理模型。从另一个角度看,我们的系统运用突出的觉察觉数据增强来通过取代不具有可持续性的术语来产生一套更加多样化的培训实例。TabFact的实验结果显示,拟议的显要性学习技术取得了有效的改进,导致新的SOTA在基准上的绩效。我们的代码公布在https://github.com/luka-group/Salience-aware-Learning。