通过深元加强学习和最佳控制避免最小限制的多机构碰撞 (Least-Restrictive Multi-Agent Collision Avoidance via Deep Meta Reinforcement Learning and Optimal Control)

Multi-agent collision-free trajectory planning and control subject to different goal requirements and system dynamics has been extensively studied, and is gaining recent attention in the realm of machine and reinforcement learning. However, in particular when using a large number of agents, constructing a least-restrictive collision avoidance policy is of utmost importance for both classical and learning-based methods. In this paper, we propose a Least-Restrictive Collision Avoidance Module (LR-CAM) that evaluates the safety of multi-agent systems and takes over control only when needed to prevent collisions. The LR-CAM is a single policy that can be wrapped around policies of all agents in a multi-agent system. It allows each agent to pursue any objective as long as it is safe to do so. The benefit of the proposed least-restrictive policy is to only interrupt and overrule the default controller in case of an upcoming inevitable danger. We use a Long Short-Term Memory (LSTM) based Variational Auto-Encoder (VAE) to enable the LR-CAM to account for a varying number of agents in the environment. Moreover, we propose an off-policy meta-reinforcement learning framework with a novel reward function based on a Hamilton-Jacobi value function to train the LR-CAM. The proposed method is fully meta-trained through a ROS based simulation and tested on real multi-agent system. Our results show that LR-CAM outperforms the classical least-restrictive baseline by 30 percent. In addition, we show that even if a subset of agents in a multi-agent system use LR-CAM, the success rate of all agents will increase significantly.

翻译：根据不同目标要求和系统动态,我们广泛研究了不受限制多试剂碰撞的轨迹规划和控制,并正在机器和强化学习领域引起人们的注意。然而,特别是在使用大量代理人的情况下,制定最不限制的避免碰撞政策对于传统和以学习为基础的方法都至关重要。在本文中,我们建议采用一个最不限制的避免碰撞模块(LR-CAM),该模块评估多试剂系统的安全,并仅在需要防止碰撞时才接管控制。LR-CAM是一个单一的政策,可以围绕多试剂系统中所有代理人的政策进行。它允许每个代理人在安全的情况下追求任何目标。拟议的最不限制性避免碰撞政策的好处是,在不可避免的危险即将到来的时候,仅仅中断和超过默认控制。我们使用基于Variational-Aut-Enectricer(LSTM)的长短期内存储存储存储器(LRM-CAM),如果在实际的MAM-RMR(MA)系统中,我们建议一个基于不断升级的机能系统,则显示一个以新的MA-MA-RM(MA-R)的递模的机的模型功能。