Sharing parameters in multi-agent deep reinforcement learning has played an essential role in allowing algorithms to scale to a large number of agents. Parameter sharing between agents significantly decreases the number of trainable parameters, shortening training times to tractable levels, and has been linked to more efficient learning. However, having all agents share the same parameters can also have a detrimental effect on learning. We demonstrate the impact of parameter sharing methods on training speed and converged returns, establishing that when applied indiscriminately, their effectiveness is highly dependent on the environment. We propose a novel method to automatically identify agents which may benefit from sharing parameters by partitioning them based on their abilities and goals. Our approach combines the increased sample efficiency of parameter sharing with the representational capacity of multiple independent networks to reduce training time and increase final returns.
翻译:多试剂深层强化学习中的共享参数在允许算法向大量物剂推广方面起到了关键作用。各种物剂之间的共享显著减少了可培训参数的数量,缩短了培训时间,使其达到可移植的水平,并与更有效的学习相联系。然而,所有物剂共享相同的参数也会对学习产生有害影响。我们展示了参数共享方法对培训速度和趋同回报的影响,确定在不加区别地应用时,其有效性高度取决于环境。我们提出了一个新颖的方法,通过根据它们的能力和目标进行分解,自动识别从共享参数中受益的物剂。我们的方法将增加的参数共享样本效率与多个独立网络的代表性能力结合起来,以减少培训时间和增加最终回报。