The difficulty of generating coherent long texts lies in the fact that existing models overwhelmingly focus on predicting local words, and cannot make high level plans on what to generate or capture the high-level discourse dependencies between chunks of texts. Inspired by human writing processes, where a list of bullet points or a catalog is first outlined, and then each bullet point is expanded to form the whole article, we propose {\it SOE}, a pipelined system that involves of summarizing, outlining and elaborating for long text generation: the model first outlines the summaries for different segments of long texts, and then elaborates on each bullet point to generate the corresponding segment. To avoid the labor-intensive process of summary soliciting, we propose the {\it reconstruction} strategy, which extracts segment summaries in an unsupervised manner by selecting its most informative part to reconstruct the segment. The proposed generation system comes with the following merits: (1) the summary provides high-level guidance for text generation and avoids the local minimum of individual word predictions; (2) the high-level discourse dependencies are captured in the conditional dependencies between summaries and are preserved during the summary expansion process and (3) additionally, we are able to consider significantly more contexts by representing contexts as concise summaries. Extensive experiments demonstrate that SOE produces long texts with significantly better quality, along with faster convergence speed.
翻译:难以形成连贯的长篇案文的原因是,现有模式主要侧重于预测当地文字,无法就如何产生或捕捉大量案文之间高层对话依赖之处制定高层次的计划。受人类写作过程的启发,首先列出一个圆点或目录清单,然后将每个圆点扩展为整个条款,我们提议了一个编审过程系统,为长篇案文的生成总结、概述和阐述:模型首先概述长篇案文不同部分的摘要,然后详细阐述每个要点,以产生相应的部分。为避免劳动密集型的简要征求过程,我们建议了“重建”战略,通过选择其最丰富的部分来重新编写这部分内容,以不受干扰的方式摘录各部分摘要。拟议的生成系统有以下优点:(1)摘要为案文的生成提供了高级别指导,避免了地方最低的单词预测;(2) 高层次的讨论依赖性在摘要之间的有条件依赖性中得到反映,并在摘要的汇总过程中加以保留。(3) 为了避免更加迅速的文本化,我们考虑通过更快速的实验,在更快速的环境下,通过更快速的实验,通过更快速的实验,从更快速的实验环境来总结。