The Werewolf game is a social deduction game based on free natural language communication, in which players try to deceive others in order to survive. An important feature of this game is that a large portion of the conversations are false information, and the behavior of artificial intelligence (AI) in such a situation has not been widely investigated. The purpose of this study is to develop an AI agent that can play Werewolf through natural language conversations. First, we collected game logs from 15 human players. Next, we fine-tuned a Transformer-based pretrained language model to construct a value network that can predict a posterior probability of winning a game at any given phase of the game and given a candidate for the next action. We then developed an AI agent that can interact with humans and choose the best voting target on the basis of its probability from the value network. Lastly, we evaluated the performance of the agent by having it actually play the game with human players. We found that our AI agent, Deep Wolf, could play Werewolf as competitively as average human players in a villager or a betrayer role, whereas Deep Wolf was inferior to human players in a werewolf or a seer role. These results suggest that current language models have the capability to suspect what others are saying, tell a lie, or detect lies in conversations.
翻译:狼人游戏是一种基于自由自然语言交流的社会推算游戏, 玩家试图在其中欺骗他人以便生存。 这个游戏的一个重要特点是, 大部分对话都是虚假信息, 而在这种情况下人工智能(AI)的行为还没有被广泛调查。 这项研究的目的是开发一个能通过自然语言对话扮演狼人的AI代理物。 首先, 我们收集了15个玩家的游戏记录。 其次, 我们细调了一个基于变异器的预先训练的语言模型, 以构建一个价值网络, 以预测在游戏任何特定阶段后期赢得游戏的可能性, 并给下一个行动的候选者一个机会。 然后我们开发了一个能与人类互动并根据其在价值网络中的概率选择最佳投票目标的AI代理物。 最后, 我们评估了该代理物的性能, 让它与人类玩游戏。 我们发现我们的AI代理物, Deep Wolf, 可以像普通的人类玩家或背叛者的角色一样, 建立一个价值网络, 而深狼人比人类玩家更差的玩家在一场游戏中, 显示现在的模型是假的。