Dungeons & Dragons (D&D) is a tabletop roleplaying game with complex natural language interactions between players and hidden state information. Recent work has shown that large language models (LLMs) that have access to state information can generate higher quality game turns than LLMs that use dialog history alone. However, previous work used game state information that was heuristically created and was not a true gold standard game state. We present FIREBALL, a large dataset containing nearly 25,000 unique sessions from real D\&D gameplay on Discord with true game state info. We recorded game play sessions of players who used the Avrae bot, which was developed to aid people in playing D&D online, capturing language, game commands and underlying game state information. We demonstrate that FIREBALL can improve natural language generation (NLG) by using Avrae state information, improving both automated metrics and human judgments of quality. Additionally, we show that LLMs can generate executable Avrae commands, particularly after finetuning.
翻译:Dungeons & Dragons (D&D) 是一个桌面角色扮演游戏,玩家之间相互作用需要进行复杂的自然语言交互,游戏状态信息也是隐藏的。最近的工作表明,可以使用具有状态信息的大型语言模型 (LLM) 来生成比仅使用对话历史的 LLM 更高质量的游戏回合。然而,以前的工作使用的游戏状态信息是通过启发式方法创建的,并非真正的金标准游戏状态。本文介绍了 FIREBALL,这是一个包含 Avrae bot 真实 D&D 英文版实际游戏数据的大型数据集,其中包含近 25,000 个唯一的游戏会话,该数据集捕获了使用 Avrae Bot 在 Discord 上玩 D&D 游戏的玩家的语言、游戏命令和潜在的游戏状态信息。我们展示了使用 FIREBALL 可以通过使用 Avrae 状态信息来改善自然语言生成 (NLG),并且能够提高自动化指标和人类品质评估。此外,我们展示了 LLM 可以生成可执行的 Avrae 命令,特别是在微调之后。