Flocking control is a challenging problem, where multiple agents, such as drones or vehicles, need to reach a target position while maintaining the flock and avoiding collisions with obstacles and collisions among agents in the environment. Multi-agent reinforcement learning has achieved promising performance in flocking control. However, methods based on traditional reinforcement learning require a considerable number of interactions between agents and the environment. This paper proposes a sub-optimal policy aided multi-agent reinforcement learning algorithm (SPA-MARL) to boost sample efficiency. SPA-MARL directly leverages a prior policy that can be manually designed or solved with a non-learning method to aid agents in learning, where the performance of the policy can be sub-optimal. SPA-MARL recognizes the difference in performance between the sub-optimal policy and itself, and then imitates the sub-optimal policy if the sub-optimal policy is better. We leverage SPA-MARL to solve the flocking control problem. A traditional control method based on artificial potential fields is used to generate a sub-optimal policy. Experiments demonstrate that SPA-MARL can speed up the training process and outperform both the MARL baseline and the used sub-optimal policy.
翻译:锁定控制是一个具有挑战性的问题,在这样的问题上,无人驾驶飞机或车辆等多种物剂需要达到目标位置,同时保持羊群,避免与环境中各种物剂之间的障碍和碰撞。多剂强化学习在群集控制方面取得了有希望的成绩。然而,基于传统强化学习的方法需要代理人与环境之间的大量互动。本文件建议采用亚最佳政策辅助多剂强化学习算法(SPA-MARL)来提高取样效率。SPA-MARL直接利用先前的政策,该政策可以用非学习方法手工设计或解决,帮助代理人学习,而该政策的表现可能低于最佳水平。SPA-MARL认识到亚最佳政策与自身在业绩上的差异,然后在亚最佳政策更好时模仿亚最佳政策。我们利用 SPA-MARL(SPA-MARL)来解决群集控制问题。基于人造潜力领域的传统控制方法被用来产生亚最佳政策。实验表明,SPA-MARopL(MARL)可以加速使用基线和马拉(MAR)次级政策。