Designing an effective communication mechanism among agents in reinforcement learning has been a challenging task, especially for real-world applications. The number of agents can grow or an environment sometimes needs to interact with a changing number of agents in real-world scenarios. To this end, a multi-agent framework needs to handle various scenarios of agents, in terms of both scales and dynamics, for being practical to real-world applications. We formulate the multi-agent environment with a different number of agents as a multi-tasking problem and propose a meta reinforcement learning (meta-RL) framework to tackle this problem. The proposed framework employs a meta-learned Communication Pattern Recognition (CPR) module to identify communication behavior and extract information that facilitates the training process. Experimental results are poised to demonstrate that the proposed framework (a) generalizes to an unseen larger number of agents and (b) allows the number of agents to change between episodes. The ablation study is also provided to reason the proposed CPR design and show such design is effective.
翻译:在强化学习的代理商之间设计有效的沟通机制是一项艰巨的任务,对于现实应用来说尤其如此。代理商的数量可以增长,或环境有时需要与现实世界情景中不断变化的代理商数量互动。为此,多代理商框架需要处理各种代理商的情景,从规模和动态角度来说,对于现实世界应用来说都是切合实际的。我们将不同代理商数量不同的多代理商环境作为一个多重任务问题来设计,并提出一个处理该问题的元强化学习(meta-RL)框架。拟议框架使用一个元学通信模式识别模块来确定沟通行为并提取有助于培训过程的信息。实验结果将表明拟议框架(a) 概括为看不见的更多代理商数量,以及(b) 允许不同时间之间的代理商数量变化。还提供通缩研究,以说明拟议的CPR设计和展示这种设计是否有效。