Narrative generation is an open-ended NLP task in which a model generates a story given a prompt. The task is similar to neural response generation for chatbots; however, innovations in response generation are often not applied to narrative generation, despite the similarity between these tasks. We aim to bridge this gap by applying and evaluating advances in decoding methods for neural response generation to neural narrative generation. In particular, we employ GPT-2 and perform ablations across nucleus sampling thresholds and diverse decoding hyperparameters -- specifically, maximum mutual information -- analyzing results over multiple criteria with automatic and human evaluation. We find that (1) nucleus sampling is generally best with thresholds between 0.7 and 0.9; (2) a maximum mutual information objective can improve the quality of generated stories; and (3) established automatic metrics do not correlate well with human judgments of narrative quality on any qualitative metric.
翻译:叙述性生成是一个开放式的NLP任务,模型在其中生成一个故事,这种任务与对聊天室的神经反应生成相似;然而,尽管这些任务相似,反应性生成的创新往往不适用于叙述性生成;我们的目标是通过应用和评估神经反应生成神经叙述生成解码方法的进展来弥合这一差距;特别是,我们采用GPT-2,并跨越核心取样阈值和多种解码超标 -- -- 特别是最大程度的相互信息 -- -- 来分析通过自动和人文评估的多重标准得出的结果;我们发现:(1) 核心取样通常最符合0.7和0.9之间的阈值;(2) 最大程度的相互信息目标可以提高生成故事的质量;(3) 既定自动指标与人类对任何定性指标的描述质量判断不相适应。