Learning anticipation is a reasoning paradigm in multi-agent reinforcement learning, where agents, during learning, consider the anticipated learning of other agents. There has been substantial research into the role of learning anticipation in improving cooperation among self-interested agents in general-sum games. Two primary examples are Learning with Opponent-Learning Awareness (LOLA), which anticipates and shapes the opponent's learning process to ensure cooperation among self-interested agents in various games such as iterated prisoner's dilemma, and Look-Ahead (LA), which uses learning anticipation to guarantee convergence in games with cyclic behaviors. So far, the effectiveness of applying learning anticipation to fully-cooperative games has not been explored. In this study, we aim to research the influence of learning anticipation on coordination among common-interested agents. We first illustrate that both LOLA and LA, when applied to fully-cooperative games, degrade coordination among agents, causing worst-case outcomes. Subsequently, to overcome this miscoordination behavior, we propose Hierarchical Learning Anticipation (HLA), where agents anticipate the learning of other agents in a hierarchical fashion. Specifically, HLA assigns agents to several hierarchy levels to properly regulate their reasonings. Our theoretical and empirical findings confirm that HLA can significantly improve coordination among common-interested agents in fully-cooperative normal-form games. With HLA, to the best of our knowledge, we are the first to unlock the benefits of learning anticipation for fully-cooperative games.
翻译:学习预测是多智能体强化学习中的一种推理范式,在学习过程中,智能体考虑其他智能体的预期学习。对于在一般和博弈中存在自利的智能体之间改善合作的学习意识,已经进行了大量的研究。两个主要例子是“与对手学习意识的学习”(LOLA)和“预先查看”(LA),其中LOLA预测和塑造对手的学习过程,以确保在不同的游戏(如重复监狱困境)中的自利智能体间进行合作,而LA使用学习预测保证具有循环行为的游戏的收敛性。到目前为止,将学习预测应用于全合作游戏的效果尚未得到探索。在本研究中,我们旨在研究学习预测对共同利益代理人之间协调的影响。我们首先说明,当LOLA和LA应用于全合作游戏时,它们会降低代理人之间的协调,导致最坏情况的结果。随后,为克服这种协调行为,我们提出了一种层次学习预测(HLA),其中代理人以分层方式预测其他代理人的学习。具体来说,HLA根据代理人的等级将代理人分配给几个层次,以适当调节他们的推理。我们的理论和经验研究结果证实,HLA可以显著提高全合作正态形式游戏中的代理人之间的协调。在我们的研究中,HLA是第一个将学习预测的好处用于全合作游戏的方法。