In cooperative multi-agent reinforcement learning (MARL), where agents only have access to partial observations, efficiently leveraging local information is critical. During long-time observations, agents can build \textit{awareness} for teammates to alleviate the problem of partial observability. However, previous MARL methods usually neglect this kind of utilization of local information. To address this problem, we propose a novel framework, multi-agent \textit{Local INformation Decomposition for Awareness of teammates} (LINDA), with which agents learn to decompose local information and build awareness for each teammate. We model the awareness as stochastic random variables and perform representation learning to ensure the informativeness of awareness representations by maximizing the mutual information between awareness and the actual trajectory of the corresponding agent. LINDA is agnostic to specific algorithms and can be flexibly integrated to different MARL methods. Sufficient experiments show that the proposed framework learns informative awareness from local partial observations for better collaboration and significantly improves the learning performance, especially on challenging tasks.
翻译:在多剂强化合作学习(MARL)中,代理商只能获得部分观测,高效利用当地信息至关重要。在长期观测中,代理商可以为团队伙伴建立\ textit{ 认识} 以缓解部分可观察性问题。然而,以往的MARL方法通常忽视了对当地信息的这种利用。为了解决这一问题,我们提议了一个新颖的框架,即多剂\ text{ 本地 Inform Information Discommission }(LINDA ),代理商学习将当地信息分解,提高每个团队的认知度。我们把认识作为随机变量进行模拟,并进行代表性学习,以确保通过最大限度地提高认识与相应代理商的实际轨迹之间的相互信息,使意识表现具有丰富性。LINDA是特定算法的准,可以灵活地与不同的MARL方法结合。充分的实验表明,拟议的框架从局部部分观测中学习信息,以便进行更好的合作,大大改进学习业绩,特别是富有挑战性的任务。