The advent of large pre-trained language models has made it possible to make high-quality predictions on how to add or change a sentence in a document. However, the high branching factor inherent to text generation impedes the ability of even the strongest language models to offer useful editing suggestions at a more global or document level. We introduce a new task, document sketching, which involves generating entire draft documents for the writer to review and revise. These drafts are built from sets of documents that overlap in form - sharing large segments of potentially reusable text - while diverging in content. To support this task, we introduce a Wikipedia-based dataset of analogous documents and investigate the application of weakly supervised methods, including use of a transformer-based mixture of experts, together with reinforcement learning. We report experiments using automated and human evaluation methods and discuss relative merits of these models.
翻译:大量经过培训的大型语言模式的出现使得有可能对如何在文件中添加或更改一个句子作出高质量的预测,然而,文本生成所固有的高分流因素妨碍了甚至最强大的语言模式在更全球或文件一级提供有益的编辑建议的能力。我们引入了一个新的任务,即文件草图,包括编写整个文件草稿,供作者审查和修订。这些草稿来自形式重叠的成套文件——分享大量可能可重复的文本——同时内容不同。为了支持这项任务,我们采用了一套基于维基百科的类似文件数据集,并调查采用监督薄弱的方法,包括使用基于变压器的专家混合体,同时加强学习。我们报告使用自动化和人文评价方法的实验,并讨论这些模型的相对优点。