Curriculum reinforcement learning (CRL) aims to speed up learning of a task by changing gradually the difficulty of the task from easy to hard through control of factors such as initial state or environment dynamics. While automating CRL is well studied in the single-agent setting, in multi-agent reinforcement learning (MARL) an open question is whether control of the number of agents with other factors in a principled manner is beneficial, prior approaches typically relying on hand-crafted heuristics. In addition, how the tasks evolve as the number of agents changes remains understudied, which is critical for scaling to more challenging tasks. We introduce self-paced MARL (SPMARL) that enables optimizing the number of agents with other environment factors in a principled way, and, show that usual assumptions such as that fewer agents make the task always easier are not generally valid. The curriculum induced by SPMARL reveals the evolution of tasks w.r.t. number of agents and experiments show that SPMARL improves the performance when the number of agents sufficiently influences task difficulty.
翻译:课程强化学习旨在通过控制初始状态或环境动态等因素,逐步将任务难度从容易程度转变为困难程度,从而加速学习一项任务。虽然在单一试剂环境下对CRL自动化进行了仔细研究,但在多试剂强化学习中,一个未决问题是,以原则方式控制代理人数量和其他因素是否有益,以前的做法通常依赖手工制作的体力学;此外,任务如何演变,因为代理器变化数量仍然未得到充分研究,这对扩大规模以完成更具挑战性的任务至关重要。我们引入了自我节奏的MARL(SPMARL),以便能够以有原则的方式优化具有其他环境因素的代理人数量,并表明通常的假设,例如减少代理人使任务总是比较容易完成,这种假设一般不成立。SPMARL所引的课程显示,代理器数量和实验表明,当代理人数量足以影响任务困难时,SPMARL会提高业绩。