We present a novel approach to data-to-text generation based on iterative text editing. Our approach maximizes the completeness and semantic accuracy of the output text while leveraging the abilities of recent pre-trained models for text editing (LaserTagger) and language modeling (GPT-2) to improve the text fluency. To this end, we first transform data items to text using trivial templates, and then we iteratively improve the resulting text by a neural model trained for the sentence fusion task. The output of the model is filtered by a simple heuristic and reranked with an off-the-shelf pre-trained language model. We evaluate our approach on two major data-to-text datasets (WebNLG, Cleaned E2E) and analyze its caveats and benefits. Furthermore, we show that our formulation of data-to-text generation opens up the possibility for zero-shot domain adaptation using a general-domain dataset for sentence fusion.
翻译:我们提出一种基于迭代文本编辑的数据到文字生成的新办法。 我们的方法是最大限度地提高输出文本的完整性和语义准确性,同时利用最近经过培训的文本编辑模型(LaserTagger)和语言模型(GPT-2)的能力来改进文本流畅性。 为此, 我们首先将数据项目转换为使用微小模板的文本, 然后我们通过一个经培训的神经模型来反复改进生成的文本。 模型的输出通过简单的超常过滤, 并重新排序为现成的、 未经培训的语言模型。 我们评估了我们对两个主要数据到文字数据集(WebNLG, Cleaned E2E)的处理方法, 并分析了其洞察力和好处。 此外, 我们展示了数据到文字生成的配制, 利用句融合的普通域数据集, 开启了零光域适应的可能性 。