While pre-trained language models can generate individually fluent sentences for automatic story generation, they struggle to generate stories that are coherent, sensible and interesting. Current state-of-the-art (SOTA) story generation models explore using higher-level features such as plots or commonsense knowledge to improve the quality of generated stories. Prompt-based learning using very large pre-trained language models (VLPLMs) such as GPT3 has demonstrated impressive performance even across various NLP tasks. In this paper, we present an extensive study using automatic and human evaluation to compare the story generation capability of VLPLMs to those SOTA models in three different datasets where stories differ in style, register and length. Our results show that VLPLMs generate much higher quality stories than other story generation models, and to a certain extent rival human authors, although preliminary investigation also reveals that they tend to ``plagiarise'' real stories in scenarios that involve world knowledge.
翻译:预训练语言模型能够生成基于句子的自动故事,但却难以生成通顺、合理、有趣的故事。当前最先进的故事生成模型探索使用更高级别的特征,如情节或常识知识,以提高生成的故事质量。使用非常大的预训练语言模型(VLPLMs),如GPT3的prompt-based learning已表现出惊人的性能,甚至跨越各种NLP任务。在本文中,我们通过自动评估和人类评估对VLPLMs在三个不同的故事数据集(其中故事在样式、登记和长度上有所不同)中生成故事的能力与其他最先进的故事生成模型进行了广泛的研究。我们的结果表明,VLPLMs生成的故事质量比其他故事生成模型高得多,并在一定程度上与人类作者相当,但初步调查还发现,它们在涉及世界知识的情况下倾向于“抄袭”真实故事。