Fast-developing fields such as Artificial Intelligence (AI) often outpace the efforts of encyclopedic sources such as Wikipedia, which either do not completely cover recently-introduced topics or lack such content entirely. As a result, methods for automatically producing content are valuable tools to address this information overload. We show that recent advances in pretrained language modeling can be combined for a two-stage extractive and abstractive approach for Wikipedia lead paragraph generation. We extend this approach to generate longer Wikipedia-style summaries with sections and examine how such methods struggle in this application through detailed studies with 100 reference human-collected surveys. This is the first study on utilizing web resources for long Wikipedia-style summaries to the best of our knowledge.
翻译:人工智能(AI)等快速发展领域往往超过诸如维基百科等百科主义来源的努力,这些来源要么没有完全涵盖最近出现的话题,要么完全缺乏这种内容。因此,自动制作内容的方法是解决信息超载问题的宝贵工具。我们表明,在预先培训的语言模型方面的最新进展可以结合到维基百科铅段一代的两阶段的采掘和抽象方法中。我们推广这一方法,用章节制作更长的维基百科式摘要,并通过100次人类收集的参考调查详细研究来研究这些方法在应用中是如何挣扎的。这是关于利用网络资源进行长期维基百科式摘要的首项研究,以了解我们的最佳知识。