Self-Driven Particles (SDP) describe a category of multi-agent systems common in everyday life, such as flocking birds and traffic flows. In a SDP system, each agent pursues its own goal and constantly changes its cooperative or competitive behaviors with its nearby agents. Manually designing the controllers for such SDP system is time-consuming, while the resulting emergent behaviors are often not realistic nor generalizable. Thus the realistic simulation of SDP systems remains challenging. Reinforcement learning provides an appealing alternative for automating the development of the controller for SDP. However, previous multi-agent reinforcement learning (MARL) methods define the agents to be teammates or enemies before hand, which fail to capture the essence of SDP where the role of each agent varies to be cooperative or competitive even within one episode. To simulate SDP with MARL, a key challenge is to coordinate agents' behaviors while still maximizing individual objectives. Taking traffic simulation as the testing bed, in this work we develop a novel MARL method called Coordinated Policy Optimization (CoPO), which incorporates social psychology principle to learn neural controller for SDP. Experiments show that the proposed method can achieve superior performance compared to MARL baselines in various metrics. Noticeably the trained vehicles exhibit complex and diverse social behaviors that improve performance and safety of the population as a whole. Demo video and source code are available at: https://decisionforce.github.io/CoPO/
翻译:自我驱动的粒子(SDP)描述日常生活中常见的多试剂系统类别,如鸟群和交通流量。在SDP系统中,每个代理商追求自己的目标,并不断改变与附近代理商的合作或竞争行为。手工设计这种SDP系统的控制器耗费时间,而由此产生的突发行为往往不切实际,也不具有普遍适用性。因此,对SDP系统的现实模拟仍然具有挑战性。强化学习为SDP控制器的发展自动化提供了一个诱人的替代方法。然而,以前的多试剂强化学习(MARL)方法将代理商定义为手前的团队或敌人,这未能抓住SDP的精髓,而每个代理商的作用甚至在一集之内也各不相同或有竞争力。要用MARL模拟SDP系统控制器,关键的挑战是如何协调代理商的行为,同时将单个目标最大化。在测试床上,我们开发了一种叫作协调政策Opimization(COPO)的新MARL方法,它包含了社会心理学原则,在SDP/COVI上学习神经控制器, 实验性测试工具可以使SMAL系统/ML系统具有可操作的精细度。实验性标准,这是一种方法,在SLSLSLSDRIL上,可以实现一个可操作的精度的精度的精细度上,在SLSLSBILSBSBSDSBIPIPA上,可以实现。实验性标准,可以实现一个可操作性标准。 。实验性标准,可以实现一个可改进的方法,可以改进的方法可以改进。在SDIPIPIPL 。 。 。实验性展示。在SDIPIPIPIPIPIPIPL 。