Training multiple agents to perform safe and cooperative control in the complex scenarios of autonomous driving has been a challenge. For a small fleet of cars moving together, this paper proposes Lepus, a new approach to training multiple agents. Lepus adopts a pure cooperative manner for training multiple agents, featured with the shared parameters of policy networks and the shared reward function of multiple agents. In particular, Lepus pre-trains the policy networks via an adversarial process, improving its collaborative decision-making capability and further the stability of car driving. Moreover, for alleviating the problem of sparse rewards, Lepus learns an approximate reward function from expert trajectories by combining a random network and a distillation network. We conduct extensive experiments on the MADRaS simulation platform. The experimental results show that multiple agents trained by Lepus can avoid collisions as many as possible while driving simultaneously and outperform the other four methods, that is, DDPG-FDE, PSDDPG, MADDPG, and MAGAIL(DDPG) in terms of stability.
翻译:在一个复杂的自主驾驶情况下,培训多个代理人以安全与合作的方式实施安全控制是一个挑战。对于一小队汽车一起行动,本文件建议采取 " Lepus ",这是培训多个代理人的新方法。Lepus对培训多个代理人采取一种纯粹的合作方式,具有政策网络的共同参数和多个代理人的共享奖励功能。特别是,Lepus通过对抗程序对政策网络进行预先跟踪,提高其合作决策能力,并促进汽车驾驶的稳定性。此外,为了减轻微薄报酬问题,Lepus通过将随机网络和蒸馏网络结合起来,从专家轨道学到了一种大致的奖励功能。我们在MADRAS模拟平台上进行了广泛的实验。实验结果表明,由Lepus培训的多个代理人可以避免尽可能多的碰撞,同时驱动并超越其他四种方法,即DDPG-FDE、SDPG、MADDPG和MAGAIL(DPG)在稳定性方面可以避免碰撞。