Although most reinforcement learning research has centered on competitive games, little work has been done on applying it to co-operative multiplayer games or text-based games. Codenames is a board game that involves both asymmetric co-operation and natural language processing, which makes it an excellent candidate for advancing RL research. To my knowledge, this work is the first to formulate Codenames as a Markov Decision Process and apply some well-known reinforcement learning algorithms such as SAC, PPO, and A2C to the environment. Although none of the above algorithms converge for the Codenames environment, neither do they converge for a simplified environment called ClickPixel, except when the board size is small.
翻译:虽然大多数强化学习研究都集中在竞争性游戏上,但在应用多玩者合作游戏或基于文本的游戏方面几乎没有做多少工作。 代码是一个棋盘游戏,它既涉及不对称合作,也涉及自然语言处理,这使得它成为推进RL研究的绝佳候选者。 据我所知,这项工作是第一个将代码作为Markov决策程序,并对环境应用一些众所周知的强化学习算法,如SAC、PPPO和A2C。 虽然上述算法没有为代码环境趋同,但它们也没有集中到一个叫做ClickPixel的简化环境,除非董事会规模小。