Logical table-to-text generation is a task that involves generating logically faithful sentences from tables, which requires models to derive logical level facts from table records via logical inference. It raises a new challenge on the logical-level content planning of table-to-text models. However, directly learning the logical inference knowledge from table-text pairs is very difficult for neural models because of the ambiguity of natural language and the scarcity of parallel data. Hence even large-scale pre-trained language models present low logical fidelity on logical table-to-text. In this work, we propose a PLOG (Pretrained Logical Form Generator) framework to improve the generation fidelity. Specifically, PLOG is first pretrained on a table-to-logic-form generation (table-to-logic) task, then finetuned on downstream table-to-text tasks. The formal definition of logical forms enables us to collect large amount of accurate logical forms from tables without human annotation. In addition, PLOG can learn logical inference from table-logic pairs much more definitely than from table-text pairs. To evaluate our model, we further collect a controlled logical table-to-text dataset CONTLOG based on an existing dataset. On two benchmarks, LOGICNLG and CONTLOG, PLOG outperforms strong baselines by a large margin on the logical fidelity, demonstrating the effectiveness of table-to-logic pretraining.
翻译:逻辑表格对文本的生成是一项任务,需要从表格中得出逻辑上可靠的词汇,这就要求模型通过逻辑推理从表格记录中得出逻辑水平的事实,这给表格对文本模型的逻辑层次内容规划提出了新的挑战。然而,由于自然语言模糊不清和平行数据稀缺,直接从表格对文本组合中学习逻辑推论知识对于神经模型来说非常困难。因此,即使是大规模预先培训的语言模型也给逻辑表格对文本带来了逻辑上的逻辑正确性。在这项工作中,我们提议了一个提高生成效率的PLOG(预加培训的逻辑格式生成器)框架。具体地说,PLOG首先在表格到表格的逻辑层次上进行了预先培训,然后在下游表格对表格对文本任务进行了细化。逻辑格式的正式定义使我们能够从表格中收集大量准确的逻辑格式,而没有人文说明。此外,PLOG能够从表格配对表格的逻辑推论比表对齐性更明确得多。我们用模型对表格的正确性组合来评估了表格的准确度。我们进一步用CROG的逻辑基准来评估了我们CL的逻辑表格,在模型上又进一步收集了CROG的逻辑表格,在CLB数据库中的数据基准。