调查通过反事实抽样从逻辑形式产生自然语言的强力 (Investigating the Robustness of Natural Language Generation from Logical Forms via Counterfactual Samples)

The aim of Logic2Text is to generate controllable and faithful texts conditioned on tables and logical forms, which not only requires a deep understanding of the tables and logical forms, but also warrants symbolic reasoning over the tables. State-of-the-art methods based on pre-trained models have achieved remarkable performance on the standard test dataset. However, we question whether these methods really learn how to perform logical reasoning, rather than just relying on the spurious correlations between the headers of the tables and operators of the logical form. To verify this hypothesis, we manually construct a set of counterfactual samples, which modify the original logical forms to generate counterfactual logical forms with rarely co-occurred table headers and logical operators. SOTA methods give much worse results on these counterfactual samples compared with the results on the original test dataset, which verifies our hypothesis. To deal with this problem, we firstly analyze this bias from a causal perspective, based on which we propose two approaches to reduce the model's reliance on the shortcut. The first one incorporates the hierarchical structure of the logical forms into the model. The second one exploits automatically generated counterfactual data for training. Automatic and manual experimental results on the original test dataset and the counterfactual dataset show that our method is effective to alleviate the spurious correlation. Our work points out the weakness of previous methods and takes a further step toward developing Logic2Text models with real logical reasoning ability.

翻译：逻辑2Text 的目的是生成以表格和逻辑格式为条件的可控和忠实文本,这些文本不仅需要深入理解表格和逻辑格式,而且需要对表格进行象征性推理。基于预先培训模型的先进方法在标准测试数据集上取得了显著的性能。然而,我们质疑这些方法是否真的学会了如何进行逻辑推理,而不是仅仅依靠逻辑表格页眉和逻辑表格操作员之间的虚假关联。为了核实这一假设,我们手工构建了一套反事实样本,这些样本不仅需要深刻理解表格和逻辑格式,而且还需要对表格进行象征性推理。基于这些表格原始逻辑格式的逻辑格式进行修改,以产生反事实逻辑格式,而很少同时使用相同的表格头版头和逻辑操作员。SOTA方法在这些反事实样本上取得了比原始测试数据集结果要差得多的结果,而原始测试数据集证实了我们的假设。为了解决这一问题,我们首先从因果关系的角度分析这种偏差,我们据此提出两个方法来减少模型对快捷键的依赖。第一个是将逻辑格式的等级结构结构纳入模型中,第一个步骤是将原始逻辑格式的分级结构结构,第二个步骤是利用我们原实验性测试模型的能力,然后用原测试方法来显示我们的数据的反向反向反现实数据。