Cooperative artificial intelligence with human or superhuman proficiency in collaborative tasks stands at the frontier of machine learning research. Prior work has tended to evaluate cooperative AI performance under the restrictive paradigms of self-play (teams composed of agents trained together) and cross-play (teams of agents trained independently but using the same algorithm). Recent work has indicated that AI optimized for these narrow settings may make for undesirable collaborators in the real-world. We formalize an alternative criteria for evaluating cooperative AI, referred to as inter-algorithm cross-play, where agents are evaluated on teaming performance with all other agents within an experiment pool with no assumption of algorithmic similarities between agents. We show that existing state-of-the-art cooperative AI algorithms, such as Other-Play and Off-Belief Learning, under-perform in this paradigm. We propose the Any-Play learning augmentation -- a multi-agent extension of diversity-based intrinsic rewards for zero-shot coordination (ZSC) -- for generalizing self-play-based algorithms to the inter-algorithm cross-play setting. We apply the Any-Play learning augmentation to the Simplified Action Decoder (SAD) and demonstrate state-of-the-art performance in the collaborative card game Hanabi.
翻译:人类或超人的合作人工智能与协作性任务的人类或超人熟练性的合作性人工智能处于机器学习研究的前沿。先前的工作倾向于在自我游戏(由共同培训的代理人组成的团队)和交叉游戏(独立培训但使用同一算法的代理人组成的团队)的限制性范式下评价合作性人工智能绩效。最近的工作表明,为这些狭窄的环境优化的人工智能可能给现实世界中不受欢迎的合作者带来最佳效果。我们正式确定了评价合作性人工智能的替代标准,称为相互操作,在实验集合中,对代理人与所有其他代理人的团队性工作业绩进行评估,而没有假设代理人之间的逻辑相似性。我们展示了当前最先进的合作性人工智能算法,如其他游戏和不完全学习。我们提议“任何功能学习增强” -- -- 一种基于多样性的内在奖赏的多剂扩展,用于零点协调(ZSC) -- -- 将基于自我演算法的自演算法推广到跨动作设置的实验库中,我们将Any-Play学习增强游戏-Simabideal-Adal-Bastical-Bastical-Cal-Acal-Acal-Bastical-Bastical-Bastical-Bastical-Bladal-Dal-C-C-C-C-Sim-C-Slistr-C-C-Bladal-Smo-Bastical-ADal-Bladal-Adal-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-Ad-Ad-SD-SD-SD-SD-SD)应用。。我们。我们。我们。我们。我们。 我们。我们提议。我们。我们。我们提议。我们提议。我们将A-演演演演演演演演算,我们提议。