Recent approaches to data-to-text generation have adopted the very successful encoder-decoder architecture or variants thereof. These models generate text which is fluent (but often imprecise) and perform quite poorly at selecting appropriate content and ordering it coherently. To overcome some of these issues, we propose a neural model with a macro planning stage followed by a generation stage reminiscent of traditional methods which embrace separate modules for planning and surface realization. Macro plans represent high level organization of important content such as entities, events and their interactions; they are learnt from data and given as input to the generator. Extensive experiments on two data-to-text benchmarks (RotoWire and MLB) show that our approach outperforms competitive baselines in terms of automatic and human evaluation.
翻译:最近的数据到文字生成方法采用了非常成功的编码器-编码器结构或其变体,这些模型生成的文本流利(但往往不精确),在选择适当内容和连贯排序方面表现很差。为了克服其中的一些问题,我们提议了一个具有宏观规划阶段的神经模型,然后是包含规划和地面实现不同模块的传统方法的代代相识阶段。宏观计划代表着实体、事件及其相互作用等重要内容的高层次组织;它们从数据中学习,并作为输入器提供给生成者。关于两个数据到文字基准(RotoWire和MLB)的广泛实验表明,我们的方法在自动和人文评估方面超过了竞争性基线。