Deep reinforcement learning has generated superhuman AI in competitive games such as Go and StarCraft. Can similar learning techniques create a superior AI teammate for human-machine collaborative games? Will humans prefer AI teammates that improve objective team performance or those that improve subjective metrics of trust? In this study, we perform a single-blind evaluation of teams of humans and AI agents in the cooperative card game \emph{Hanabi}, with both rule-based and learning-based agents. In addition to the game score, used as an objective metric of the human-AI team performance, we also quantify subjective measures of the human's perceived performance, teamwork, interpretability, trust, and overall preference of AI teammate. We find that humans have a clear preference toward a rule-based AI teammate (SmartBot) over a state-of-the-art learning-based AI teammate (Other-Play) across nearly all subjective metrics, and generally view the learning-based agent negatively, despite no statistical difference in the game score. This result has implications for future AI design and reinforcement learning benchmarking, highlighting the need to incorporate subjective metrics of human-AI teaming rather than a singular focus on objective task performance.
翻译:在Go和StarCraft等竞争性游戏中,深入强化学习产生了超人AI。类似的学习技术能否为人类机械协作游戏创建高级AI队队友?人类是否更喜欢提高客观团队业绩或提高主观信任度度的AI队友?在这项研究中,我们用基于规则的和基于学习的代理人对合作卡游戏中的人类和AI代理人队队队友进行单盲评价。除了作为人类-AI队业绩客观指标的游戏分数之外,我们还能够量化人类认为的绩效、团队协作、可解释性、信任和AI队友的总体偏好等主观衡量尺度。我们发现,人类显然倾向于采用基于规则的AI队友(SmartBot),而不是使用基于标准、基于学习的新型AI队友(Other-Play),并且一般地认为学习的代理人是负面的,尽管在游戏分数方面没有统计差异。这对未来的AI队友的主观设计和强化学习基准有影响,而不是对未来的AI队的主观衡量标准,我们强调需要纳入人类智能任务重点的主观度,而不是单一。