In 2021 the Johns Hopkins University Applied Physics Laboratory held an internal challenge to develop artificially intelligent (AI) agents that could excel at the collaborative card game Hanabi. Agents were evaluated on their ability to play with human players whom the agents had never previously encountered. This study details the development of the agent that won the challenge by achieving a human-play average score of 16.5, outperforming the current state-of-the-art for human-bot Hanabi scores. The winning agent's development consisted of observing and accurately modeling the author's decision making in Hanabi, then training with a behavioral clone of the author. Notably, the agent discovered a human-complementary play style by first mimicking human decision making, then exploring variations to the human-like strategy that led to higher simulated human-bot scores. This work examines in detail the design and implementation of this human compatible Hanabi teammate, as well as the existence and implications of human-complementary strategies and how they may be explored for more successful applications of AI in human machine teams.
翻译:2021年,约翰·霍普金斯大学应用物理实验室在开发人造智能(AI)剂方面遇到了内部挑战,这些剂在Hanabi合作牌游戏中可以出类拔萃。这些剂被评估为他们与以前从未遇到过的人类玩家玩耍的能力。本研究详细介绍了通过达到人类玩耍平均得分16.5而赢得挑战的代理人的发展,这比目前人类玩耍平均得分高16.5分的先进水平高。获胜剂的发展包括观察和准确地模拟作者在Hanabi的决策,然后用作者的行为克隆人进行培训。值得注意的是,该剂通过首先模拟人类决策,发现了一种人造辅助游戏的风格,然后探索了导致模拟人造机器人得分更高的人型战略的变异。这项工作详细审查了这个与人兼容的Hanabi团队队队的设计和执行情况,以及人类补充战略的存在和影响,以及如何探索这些策略,以便在人体机器队中更成功地应用AI。