Handling the problem of scalability is one of the essential issues for multi-agent reinforcement learning (MARL) algorithms to be applied to real-world problems typically involving massively many agents. For this, parameter sharing across multiple agents has widely been used since it reduces the training time by decreasing the number of parameters and increasing the sample efficiency. However, using the same parameters across agents limits the representational capacity of the joint policy and consequently, the performance can be degraded in multi-agent tasks that require different behaviors for different agents. In this paper, we propose a simple method that adopts structured pruning for a deep neural network to increase the representational capacity of the joint policy without introducing additional parameters. We evaluate the proposed method on several benchmark tasks, and numerical results show that the proposed method significantly outperforms other parameter-sharing methods.
翻译:处理可缩放性问题是多试剂强化学习算法(MARL)的基本问题之一,应用于通常涉及众多物剂的实际问题。为此,广泛使用多种物剂的参数共享,因为通过减少参数数量和增加抽样效率,减少了培训时间;不过,不同物剂使用相同的参数限制了联合政策的代表性能力,因此,在需要不同物剂不同行为的多种物剂任务中,性能可能会退化。我们在本文件中提出一个简单的方法,为深层神经网络采用结构划线,以提高联合政策的代表性能力,而不增加其他参数。我们评估了几项基准任务的拟议方法,数字结果显示,拟议的方法大大优于其他参数共享方法。</s>