Logical Natural Language Generation, i.e., generating textual descriptions that can be logically entailed by a structured table, has been a challenge due to the low fidelity of the generation. \citet{chen2020logic2text} have addressed this problem by annotating interim logical programs to control the generation contents and semantics, and presented the task of table-aware logical form to text (Logic2text) generation. However, although table instances are abundant in the real world, logical forms paired with textual descriptions require costly human annotation work, which limits the performance of neural models. To mitigate this, we propose topic-conditioned data augmentation (TopicDA), which utilizes GPT-2 to generate unpaired logical forms and textual descriptions directly from tables. We further introduce logical form generation (LG), a dual task of Logic2text that requires generating a valid logical form based on a text description of a table. We also propose a semi-supervised learning approach to jointly train a Logic2text and an LG model with both labeled and augmented data. The two models benefit from each other by providing extra supervision signals through back-translation. Experimental results on the Logic2text dataset and the LG task demonstrate that our approach can effectively utilize the augmented data and outperform supervised baselines by a substantial margin.
翻译:逻辑性自然语言生成,即产生逻辑性由结构化表格必然产生的文字描述,由于该生成的忠实度低,因此一直是一个挑战。\citet{chen2020logic2text}通过说明临时逻辑程序来控制生成内容和语义来解决这一问题,并将表性逻辑格式的任务介绍给文字生成(逻辑2text),然而,尽管表格中的例子很多,但逻辑形式与文字描述相配的逻辑形式需要昂贵的人类说明工作,这限制了神经模型的性能。为了减轻这一困难,我们提议采用有主题性的数据增强(TopicaDA),利用GPT-2直接生成无格式的逻辑格式和文字描述。我们进一步引入逻辑形式生成(LG),这是逻辑性格式生成的双重任务,需要根据表格的文字描述产生有效的逻辑格式。我们还建议采用半超超超超标准学习方法,共同培训日志文本模型和LG模型,同时使用标签和增强的数据模型。两个模型都利用G2号生成的实验性基准,从而有效地利用其他数据更新模型,从而有效地利用实验性基准,从而利用其他数据更新模型。