A critical challenge in multi-agent reinforcement learning(MARL) is for multiple agents to efficiently accomplish complex, long-horizon tasks. The agents often have difficulties in cooperating on common goals, dividing complex tasks, and planning through several stages to make progress. We propose to address these challenges by guiding agents with programs designed for parallelization, since programs as a representation contain rich structural and semantic information, and are widely used as abstractions for long-horizon tasks. Specifically, we introduce Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance(E-MAPP), a novel framework that leverages parallel programs to guide multiple agents to efficiently accomplish goals that require planning over $10+$ stages. E-MAPP integrates the structural information from a parallel program, promotes the cooperative behaviors grounded in program semantics, and improves the time efficiency via a task allocator. We conduct extensive experiments on a series of challenging, long-horizon cooperative tasks in the Overcooked environment. Results show that E-MAPP outperforms strong baselines in terms of the completion rate, time efficiency, and zero-shot generalization ability by a large margin.
翻译:多试剂强化学习(MARL)的关键挑战是,多剂强化学习(MARL)的多重代理机构要高效率地完成复杂、长方位任务,这些代理机构往往难以在共同目标、复杂任务和规划几个阶段上开展合作,以取得进展。我们建议由具有平行方案的指导代理机构来应对这些挑战,因为作为代表的方案包含丰富的结构和语义信息,并被广泛用作长方位任务的抽象内容。具体地说,我们引入了高效的多剂强化学习和平行方案指南(E-MAPPP),这是一个利用平行方案指导多个代理机构高效率地完成需要10美元以上阶段规划的目标的新框架。 E-MAP将平行方案的结构信息整合到一个平行方案,促进基于方案语义的合作行为,并通过任务标定器提高时间效率。我们在超载环境中的一系列具有挑战性、长方位合作任务上进行了广泛的实验。结果显示,E-MAPP在完成率、时间效率和零全面化能力方面,大大超出了强大的基线。