We propose a novel task, G4C (Goal-driven Guidance Generation in Grounded Communication), for studying goal-driven and grounded natural language interactions. Specifically, we choose Dungeons and Dragons (D&D) -- a role-playing game consisting of multiple player characters and a Dungeon Master (DM) who collaborate to achieve a set of goals that are beneficial to the players -- as a testbed for this task. Here, each of the player characters is a student, with their own personas and abilities, and the DM is the teacher, an arbitrator of the rules of the world and responsible for assisting and guiding the students towards a global goal. We propose a theory-of-mind-inspired methodology for training such a DM with reinforcement learning (RL), where a DM: (1) learns to predict how the players will react to its utterances using a dataset of D&D dialogue transcripts; and (2) uses this prediction as a reward function providing feedback on how effective these utterances are at guiding the players towards a goal. Human and automated evaluations show that a DM trained with RL to generate guidance by incorporating a theory-of-mind of the players significantly improves the players' ability to achieve goals grounded in their shared world.
翻译:我们提出一个新的任务,即G4C(由目标驱动的地下通信指导产生),用于研究以目标驱动和有根基的自然语言互动。具体地说,我们选择Dungeons和Dongs(D&D) -- -- 一个由多个玩家人物和Dungeon Master(DM)组成的角色扮演游戏,他们合作实现一系列有益于玩家的目标 -- -- 作为这项任务的测试。在这里,每个玩家人物都是学生,他们有自己的个性和能力,管理部是教师,世界规则的仲裁者,负责协助和指导学生走向全球目标。我们提出了一种以强化学习(RL)方式培训DM(DM)的启发性理论方法,其中DM:(1) 学会如何利用D&D对话记录数据集预测玩家如何应对其直言;(2) 将这一预测作为一种奖励功能,提供反馈,说明这些直言在引导玩家走向一个目标方面的效力。 人与自动评价显示,受RL培训的DMDM受过指导,通过将理论纳入世界球员的实现世界目标的能力,从而产生指导。