When one agent interacts with a multi-agent environment, it is challenging to deal with various opponents unseen before. Modeling the behaviors, goals, or beliefs of opponents could help the agent adjust its policy to adapt to different opponents. In addition, it is also important to consider opponents who are learning simultaneously or capable of reasoning. However, existing work usually tackles only one of the aforementioned types of opponents. In this paper, we propose model-based opponent modeling (MBOM), which employs the environment model to adapt to all kinds of opponents. MBOM simulates the recursive reasoning process in the environment model and imagines a set of improving opponent policies. To effectively and accurately represent the opponent policy, MBOM further mixes the imagined opponent policies according to the similarity with the real behaviors of opponents. Empirically, we show that MBOM achieves more effective adaptation than existing methods in a variety of tasks, respectively with different types of opponents, i.e., fixed policy, na\"ive learner, and reasoning learner.
翻译:当一个代理商与一个多试剂环境发生互动时,要对付以前不为人知的各种反对者是困难的。 模拟反对者的行为、目标或信仰可以帮助该代理商调整其政策以适应不同的反对者。 此外,还必须考虑到同时学习或有能力进行推理的反对者。 但是,现有的工作通常只处理上述类型的反对者之一。 在本文中,我们建议采用模型式的反对者模型(MBOM),它使用环境模型来适应各种反对者。 MBOM模拟环境模型中的循环推理过程,并想象一套改进对手政策的方法。为了有效和准确地代表对手政策,MBOM进一步根据反对者的真实行为与对手的相似性混合了想象中的反对者政策。我们很生动地表明,MBOM在各种各样的任务中,分别与不同类型的反对者,即固定政策、na\“ 学习者” 和理性学习者,比现有方法更有成效地适应。