The advent of large pre-trained generative language models has provided a common framework for AI story generation via sampling the model to create sequences that continue the story. However, sampling alone is insufficient for story generation. In particular, it is hard to direct a language model to create stories to reach a specific goal event. We present two automated techniques grounded in deep reinforcement learning and reward shaping to control the plot of computer-generated stories. The first utilizes proximal policy optimization to fine-tune an existing transformer-based language model to generate text continuations but also be goal-seeking. The second extracts a knowledge graph from the unfolding story, which is used by a policy network with graph attention to select a candidate continuation generated by a language model. We report on automated metrics pertaining to how often stories achieve a given goal event as well as human participant rankings of coherence and overall story quality compared to baselines and ablations.
翻译:大型经过培训的基因化语言模型的出现为AI故事的产生提供了一个共同的框架,通过取样模型为创建继续故事的序列提供了一个共同的框架。然而,仅靠取样不足以生成故事。特别是,很难指导一种语言模型来创建故事以达到特定的目标事件。我们展示了基于深度强化学习和奖励塑造的两种自动技术,以控制计算机生成故事的图象。第一种是利用近似政策优化来微调现有基于变压器的语言模型,以生成文本延续,但也是目标搜索。第二套是从正在发展的故事中提取一个知识图,该图由一个政策网络用于选择由一种语言模型产生的候选人延续。我们报告自动化指标,说明故事经常达到某一目标事件的程度,以及人类参与者相对于基线和推理而言的一致性和总体故事质量的排名。