Safety is of great importance in multi-robot navigation problems. In this paper, we propose a control barrier function (CBF) based optimizer that ensures robot safety with both high probability and flexibility, using only sensor measurement. The optimizer takes action commands from the policy network as initial values and then provides refinement to drive the potentially dangerous ones back into safe regions. With the help of a deep transition model that predicts the evolution of surrounding dynamics and the consequences of different actions, the CBF module can guide the optimization in a reasonable time horizon. We also present a novel joint training framework that improves the cooperation between the Reinforcement Learning (RL) based policy and the CBF-based optimizer both in training and inference procedures by utilizing reward feedback from the CBF module. We observe that the policy using our method can achieve a higher success rate while maintaining the safety of multiple robots in significantly fewer episodes compared with other methods. Experiments are conducted in multiple scenarios both in simulation and the real world, the results demonstrate the effectiveness of our method in maintaining the safety of multi-robot navigation. Code is available at \url{https://github.com/YuxiangCui/MARL-OCBF
翻译:在多机器人导航问题上,安全非常重要。 在本文中,我们提出了一个基于控制屏障的优化功能(CBF),确保机器人安全的可能性和灵活性,仅使用传感器测量。优化者将政策网络的行动指令作为初始值,然后提供改进,将潜在危险分子送回安全区域。在预测周围动态演变和不同行动后果的深度过渡模型的帮助下,CBF模块可以在合理的时间范围内指导优化。我们还提出了一个新的联合培训框架,改进基于强化学习(RL)的政策与基于CBF的优化者在培训和推断程序方面的合作,同时利用CBF模块的奖励反馈。我们观察到,使用我们的方法可以取得更高的成功率,同时在比其他方法少得多的情况下维持多机器人的安全。实验在模拟和现实世界的多种情景中进行,其结果显示了我们维护多机器人导航安全的方法的有效性。代码可在url{https://github/MUC/Yuxian查阅。