AI researchers have posited Dungeons and Dragons (D&D) as a challenge problem to test systems on various language-related capabilities. In this paper, we frame D&D specifically as a dialogue system challenge, where the tasks are to both generate the next conversational turn in the game and predict the state of the game given the dialogue history. We create a gameplay dataset consisting of nearly 900 games, with a total of 7,000 players, 800,000 dialogue turns, 500,000 dice rolls, and 58 million words. We automatically annotate the data with partial state information about the game play. We train a large language model (LM) to generate the next game turn, conditioning it on different information. The LM can respond as a particular character or as the player who runs the game--i.e., the Dungeon Master (DM). It is trained to produce dialogue that is either in-character (roleplaying in the fictional world) or out-of-character (discussing rules or strategy). We perform a human evaluation to determine what factors make the generated output plausible and interesting. We further perform an automatic evaluation to determine how well the model can predict the game state given the history and examine how well tracking the game state improves its ability to produce plausible conversational output.
翻译:AI 研究人员将 Dungeons 和 Dongs (D&D) 视为测试各种语言相关能力系统的挑战问题。 在本文中, 我们将 D&D 具体设定为对话系统的挑战, 任务在于同时在游戏中产生下一个对话转弯, 并预测游戏的状态。 我们创建了一个游戏数据集, 由近900个游戏组成, 共7000个玩家、 800个对话框转转盘、 500 000 dice Rolls 和 5800万个单词组成。 我们自动用部分状态信息来说明数据。 我们训练了一个大型语言模型( LM) 来生成下一个游戏转盘, 以不同的信息为调试 。 LM 可以作为特定字符响应, 或者作为运行游戏- 即 Dungeon Master (DM) 的玩家。 我们训练它制作一个游戏数据集, 要么在字形游戏( 在虚构世界玩游戏), 要么在外形世界玩游戏( 讨论规则或策略 ) 。 我们进行人类评估, 以确定哪些因素可以使生成的输出变得合理和有趣。 我们进一步进行游戏的游戏的游戏的游戏如何跟踪能力, 来分析它是如何改进了它如何进行。