Generating texts which express complex ideas spanning multiple sentences requires a structured representation of their content (document plan), but these representations are prohibitively expensive to manually produce. In this work, we address the problem of generating coherent multi-sentence texts from the output of an information extraction system, and in particular a knowledge graph. Graphical knowledge representations are ubiquitous in computing, but pose a significant challenge for text generation techniques due to their non-hierarchical nature, collapsing of long-distance dependencies, and structural variety. We introduce a novel graph transforming encoder which can leverage the relational structure of such knowledge graphs without imposing linearization or hierarchical constraints. Incorporated into an encoder-decoder setup, we provide an end-to-end trainable system for graph-to-text generation that we apply to the domain of scientific text. Automatic and human evaluations show that our technique produces more informative texts which exhibit better document structure than competitive encoder-decoder methods.
翻译:生成表达涉及多个句子的复杂想法的文本需要对其内容进行结构化的描述(文件计划),但这些表达方式对于手工制作来说成本太高了。 在这项工作中,我们从信息提取系统的产出中,特别是知识图解中,解决了生成一致的多语种文本的问题。图形化的知识表达方式在计算中普遍存在,但由于其非等级性质、长距离依赖性崩溃和结构多样性,对文本生成技术构成重大挑战。我们引入了一个新颖的图表转换编码器,它可以在不强加线性或等级限制的情况下利用这类知识图形的关联结构。我们将它纳入一个编码脱码器设置中,我们为我们适用于科学文本领域的图表到文字生成提供了一个端到端可培训的系统。自动和人文评估显示,我们的技术产生的信息文本比竞争性的编码解码器方法更加丰富。