While pre-trained language models can generate individually fluent sentences for automatic story generation, they struggle to generate stories that are coherent, sensible and interesting. Current state-of-the-art (SOTA) story generation models explore using higher-level features such as plots or commonsense knowledge to improve the quality of generated stories. Prompt-based learning using very large pre-trained language models (VLPLMs) such as GPT3 has demonstrated impressive performance even across various NLP tasks. In this paper, we present an extensive study using automatic and human evaluation to compare the story generation capability of VLPLMs to those SOTA models in three different datasets where stories differ in style, register and length. Our results show that VLPLMs generate much higher quality stories than other story generation models, and to a certain extent rival human authors, although preliminary investigation also reveals that they tend to ``plagiarise'' real stories in scenarios that involve world knowledge.
翻译:虽然经过培训的语文模型可以产生单体流畅的自动故事生成,但它们在努力生成连贯、合理和有趣的故事。 目前的最先进的(SOTA)故事生成模型探索使用高层次的特征,如地块或普通知识来提高生成故事的质量。 使用诸如GPT3等非常庞大的经过培训的语言模型(VLPLMs)的快速学习展示了令人印象深刻的成绩,甚至在各种国家语言项目任务中也是如此。 在本文中,我们介绍了一项广泛的研究,利用自动和人文评估将VLPLMs的故事生成能力与三个不同的数据集中SOTA模型进行比较,其中的故事在风格、登记和长度上各不相同。 我们的结果显示,VLPLMs生成的高质量故事比其他故事生成模型要高得多,在某种程度上是人文作者的对手,尽管初步调查还表明,它们倾向于在涉及世界知识的情景中“lagigarise”真实故事。