Deep reinforcement learning has generated superhuman AI in competitive games such as Go and StarCraft. Can similar learning techniques create a superior AI teammate for human-machine collaborative games? Will humans prefer AI teammates that improve objective team performance or those that improve subjective metrics of trust? In this study, we perform a single-blind evaluation of teams of humans and AI agents in the cooperative card game Hanabi, with both rule-based and learning-based agents. In addition to the game score, used as an objective metric of the human-AI team performance, we also quantify subjective measures of the human's perceived performance, teamwork, interpretability, trust, and overall preference of AI teammate. We find that humans have a clear preference toward a rule-based AI teammate (SmartBot) over a state-of-the-art learning-based AI teammate (Other-Play) across nearly all subjective metrics, and generally view the learning-based agent negatively, despite no statistical difference in the game score. This result has implications for future AI design and reinforcement learning benchmarking, highlighting the need to incorporate subjective metrics of human-AI teaming rather than a singular focus on objective task performance.
翻译:在Go和StarCraft等竞争性游戏中,深入强化学习产生了超人AI。类似的学习技术能否为人类机械协作游戏创建高级AI队队友?人类是否更喜欢能够提高客观团队业绩或提高主观信任度度的AI队友?在这项研究中,我们对Hanabi合作牌游戏中的人类和AI剂队队友和AI剂进行单盲评估,既有规则性的,也有学习性的。除了作为人类-AI队业绩客观衡量尺度的游戏分数外,我们还量化了人类感觉的表现、团队协作、可解释性、信任和总体偏好的AI队友的主观衡量尺度。我们发现,人类显然倾向于采用基于规则的AI队友(SmartBot),而不是在所有主观指标上采用最先进的基于学习的AI队友(Of-Play),并且一般地认为学习的代理人是负面的,尽管在游戏分数上没有统计差异。这对未来的AI队友队设计和强化学习基准有影响,强调需要纳入人类AI队队队的主观衡量标准,而不是单一目标性任务重点。