Training a team to complete a complex task via multi-agent reinforcement learning can be difficult due to challenges such as policy search in a large policy space, and non-stationarity caused by mutually adapting agents. To facilitate efficient learning of complex multi-agent tasks, we propose an approach which uses an expert-provided curriculum of simpler multi-agent sub-tasks. In each sub-task of the curriculum, a subset of the entire team is trained to acquire sub-task-specific policies. The sub-teams are then merged and transferred to the target task, where their policies are collectively fined tuned to solve the more complex target task. We present MEDoE, a flexible method which identifies situations in the target task where each agent can use its sub-task-specific skills, and uses this information to modulate hyperparameters for learning and exploration during the fine-tuning process. We compare MEDoE to multi-agent reinforcement learning baselines that train from scratch in the full task, and with na\"ive applications of standard multi-agent reinforcement learning techniques for fine-tuning. We show that MEDoE outperforms baselines which train from scratch or use na\"ive fine-tuning approaches, requiring significantly fewer total training timesteps to solve a range of complex teamwork tasks.
翻译:培训团队通过多试剂强化学习完成复杂任务可能很困难,因为存在挑战,如在大型政策空间进行政策搜索,以及由相互适应的代理人造成的非常态性等。为了便利高效地学习复杂的多试任务,我们建议了一种方法,即使用专家提供的简单多试子任务课程。在课程的每一个子任务中,整个团队的一个子组都受过培训,以获得分任务特定政策。随后,子团队合并并转到目标任务,其政策被集体调整,以解决更为复杂的目标任务。我们介绍了MEDoE,这是一种灵活的方法,用以确定目标任务中每个代理可以使用其子任务特定技能的情况,并使用这一信息调整超参数,用于在微调过程中学习和探索。我们把MEDoE与多试强化基线进行比较,从整项任务中逐项培训,并结合“对标准多试剂强化学习技术进行细调的反向应用。我们展示了MEE的全成型基准,即要求从细化或使用重整级训练到重整的精度方法的全程,要求从重整整式训练的超时程基线。