Many real-world applications can be formulated as multi-agent cooperation problems, such as network packet routing and coordination of autonomous vehicles. The emergence of deep reinforcement learning (DRL) provides a promising approach for multi-agent cooperation through the interaction of the agents and environments. However, traditional DRL solutions suffer from the high dimensions of multiple agents with continuous action space during policy search. Besides, the dynamicity of agents' policies makes the training non-stationary. To tackle the issues, we propose a hierarchical reinforcement learning approach with high-level decision-making and low-level individual control for efficient policy search. In particular, the cooperation of multiple agents can be learned in high-level discrete action space efficiently. At the same time, the low-level individual control can be reduced to single-agent reinforcement learning. In addition to hierarchical reinforcement learning, we propose an opponent modeling network to model other agents' policies during the learning process. In contrast to end-to-end DRL approaches, our approach reduces the learning complexity by decomposing the overall task into sub-tasks in a hierarchical way. To evaluate the efficiency of our approach, we conduct a real-world case study in the cooperative lane change scenario. Both simulation and real-world experiments show the superiority of our approach in the collision rate and convergence speed.
翻译:许多实际应用可以发展成多剂合作问题,如网络包路径和自主车辆的协调等。深入强化学习(DRL)的出现为通过代理人和环境的互动开展多剂合作提供了一个很有希望的方法。然而,传统的DRL解决方案由于在政策搜索过程中具有持续行动空间的多种代理方的高度而受到影响。此外,代理方政策的动态使得培训无法静止。为了解决这些问题,我们建议采用等级强化学习方法,通过高层决策和低层次个人控制来高效政策搜索。特别是,在高层次的离散行动空间中可以有效地学习多剂合作。与此同时,低层次的个人控制可以降低为单一剂强化学习。除了等级强化学习之外,我们提议建立一个对手模型网络,在学习过程中模拟其他代理方的政策。与最终的DRL方法相比,我们的方法通过将总体任务分为分级化为子任务来降低学习的复杂性。我们评估我们的方法的效率,我们进行真实世界级的升级率和在合作轨迹模型中模拟了我们的真实的碰撞率率。