We propose Searching with Opponent-Awareness (SOA), an approach to leverage opponent-aware planning without explicit or a priori opponent models for improving performance and social welfare in multi-agent systems. To this end, we develop an opponent-aware MCTS scheme using multi-armed bandits based on Learning with Opponent-Learning Awareness (LOLA) and compare its effectiveness with other bandits, including UCB1. Our evaluations include several different settings and show the benefits of SOA are especially evident with increasing number of agents.
翻译:我们建议采用 " 以反对者意识搜索 " (SOA)方法,在没有明确或先验的反对者意识规划模式的情况下,利用对抗者意识规划来提高多试剂系统的业绩和社会福利,为此,我们根据 " 以反对者-学习意识学习 " (LOLA)的学习,制定多武装的MCTS计划,并将其效力与其他土匪(包括UCB1)进行比较。 我们的评估包括若干不同的环境,并显示SOA的好处随着代理人数量的增加而特别明显。