While GPT-2 generates sentences that are remarkably human-like, longer documents can ramble and do not follow human-like writing structure. We study the problem of imposing structure on long-range text. We propose a novel controlled text generation task, sequentially controlled text generation, and identify a dataset, NewsDiscourse as a starting point for this task. We develop a sequential controlled text generation pipeline with generation and editing. We test different degrees of structural awareness and show that, in general, more structural awareness results in higher control-accuracy, grammaticality, coherency and topicality, approaching human-level writing performance.
翻译:虽然GPT-2生成的句子与人非常相似,但更长的文档可能会乱七八糟,而不是像人一样的书写结构。我们研究了将结构强加在长程文本上的问题。我们提出了一个新的控制文本生成任务,按顺序控制文本生成,并确定了一个数据集,即NewsDiscours,作为这项任务的起点。我们开发了一个有生成和编辑的顺序控制文本生成管道。我们测试了不同程度的结构意识,并显示,总体而言,在更高的控制准确性、语法性、一致性和时事性、接近人类水平的写作性能方面,提高了结构性意识。