协同学习层次预测全合作智能体的行动 (Coordinating Fully-Cooperative Agents Using Hierarchical Learning Anticipation)

Learning anticipation is a reasoning paradigm in multi-agent reinforcement learning, where agents, during learning, consider the anticipated learning of other agents. There has been substantial research into the role of learning anticipation in improving cooperation among self-interested agents in general-sum games. Two primary examples are Learning with Opponent-Learning Awareness (LOLA), which anticipates and shapes the opponent's learning process to ensure cooperation among self-interested agents in various games such as iterated prisoner's dilemma, and Look-Ahead (LA), which uses learning anticipation to guarantee convergence in games with cyclic behaviors. So far, the effectiveness of applying learning anticipation to fully-cooperative games has not been explored. In this study, we aim to research the influence of learning anticipation on coordination among common-interested agents. We first illustrate that both LOLA and LA, when applied to fully-cooperative games, degrade coordination among agents, causing worst-case outcomes. Subsequently, to overcome this miscoordination behavior, we propose Hierarchical Learning Anticipation (HLA), where agents anticipate the learning of other agents in a hierarchical fashion. Specifically, HLA assigns agents to several hierarchy levels to properly regulate their reasonings. Our theoretical and empirical findings confirm that HLA can significantly improve coordination among common-interested agents in fully-cooperative normal-form games. With HLA, to the best of our knowledge, we are the first to unlock the benefits of learning anticipation for fully-cooperative games.

翻译：学习预测是多智能体强化学习中的一种推理范式，在学习过程中，智能体考虑其他智能体的预期学习。对于在一般和博弈中存在自利的智能体之间改善合作的学习意识，已经进行了大量的研究。两个主要例子是“与对手学习意识的学习”（LOLA）和“预先查看”（LA），其中LOLA预测和塑造对手的学习过程，以确保在不同的游戏（如重复监狱困境）中的自利智能体间进行合作，而LA使用学习预测保证具有循环行为的游戏的收敛性。到目前为止，将学习预测应用于全合作游戏的效果尚未得到探索。在本研究中，我们旨在研究学习预测对共同利益代理人之间协调的影响。我们首先说明，当LOLA和LA应用于全合作游戏时，它们会降低代理人之间的协调，导致最坏情况的结果。随后，为克服这种协调行为，我们提出了一种层次学习预测（HLA），其中代理人以分层方式预测其他代理人的学习。具体来说，HLA根据代理人的等级将代理人分配给几个层次，以适当调节他们的推理。我们的理论和经验研究结果证实，HLA可以显著提高全合作正态形式游戏中的代理人之间的协调。在我们的研究中，HLA是第一个将学习预测的好处用于全合作游戏的方法。