We study the multi-agent safe control problem where agents should avoid collisions to static obstacles and collisions with each other while reaching their goals. Our core idea is to learn the multi-agent control policy jointly with learning the control barrier functions as safety certificates. We propose a novel joint-learning framework that can be implemented in a decentralized fashion, with generalization guarantees for certain function classes. Such a decentralized framework can adapt to an arbitrarily large number of agents. Building upon this framework, we further improve the scalability by incorporating neural network architectures that are invariant to the quantity and permutation of neighboring agents. In addition, we propose a new spontaneous policy refinement method to further enforce the certificate condition during testing. We provide extensive experiments to demonstrate that our method significantly outperforms other leading multi-agent control approaches in terms of maintaining safety and completing original tasks. Our approach also shows exceptional generalization capability in that the control policy can be trained with 8 agents in one scenario, while being used on other scenarios with up to 1024 agents in complex multi-agent environments and dynamics.
翻译:我们研究多剂安全控制问题,在多剂安全控制问题上,物剂应避免与静态障碍碰撞,在达到目标时避免相互碰撞。我们的核心思想是学习多剂控制政策,共同学习控制障碍功能作为安全证书。我们提出一个新的联合学习框架,可以分散实施,对某些功能类别提供一般化保障。这种分散化框架可以适应任意大量的物剂。在这个框架的基础上,我们通过纳入神经网络结构来进一步改进可扩缩性,这些结构对邻近物剂的数量和变异性是无法改变的。此外,我们提出了新的自发政策改进方法,以在测试期间进一步实施证书条件。我们提供了广泛的实验,以证明我们的方法大大优于其他主要的多剂控制方法,在维护安全和完成原有任务方面。我们的方法还显示非常的普及能力,即控制政策可在一种情景中培训8个物剂,同时用于其他情景,在复杂的多剂环境和动态环境中使用多达1024种物剂。