Search and Rescue (SAR) missions in remote environments often employ autonomous multi-robot systems that learn, plan, and execute a combination of local single-robot control actions, group primitives, and global mission-oriented coordination and collaboration. Often, SAR coordination strategies are manually designed by human experts who can remotely control the multi-robot system and enable semi-autonomous operations. However, in remote environments where connectivity is limited and human intervention is often not possible, decentralized collaboration strategies are needed for fully-autonomous operations. Nevertheless, decentralized coordination may be ineffective in adversarial environments due to sensor noise, actuation faults, or manipulation of inter-agent communication data. In this paper, we propose an algorithmic approach based on adversarial multi-agent reinforcement learning (MARL) that allows robots to efficiently coordinate their strategies in the presence of adversarial inter-agent communications. In our setup, the objective of the multi-robot team is to discover targets strategically in an obstacle-strewn geographical area by minimizing the average time needed to find the targets. It is assumed that the robots have no prior knowledge of the target locations, and they can interact with only a subset of neighboring robots at any time. Based on the centralized training with decentralized execution (CTDE) paradigm in MARL, we utilize a hierarchical meta-learning framework to learn dynamic team-coordination modalities and discover emergent team behavior under complex cooperative-competitive scenarios. The effectiveness of our approach is demonstrated on a collection of prototype grid-world environments with different specifications of benign and adversarial agents, target locations, and agent rewards.
翻译:远程环境中的搜索和救援(SAR)任务往往采用自主的多机器人系统,这些系统可以学习、规划和实施当地单一机器人控制行动、集团原始数据以及全球面向特派团的协调和协作。搜索和救援(SAR)任务通常由人类专家手工设计,这些专家可以远程控制多机器人系统,并能够进行半自动行动。然而,在连通性有限和人力干预往往不可能的偏远环境中,需要为完全自主的行动制定分散化的协作战略。然而,分散化的协调在对抗环境中可能无效,因为传感器噪音、启动故障或操纵机构间通信数据。在本文件中,我们建议采用基于对抗性多剂强化强化学习(MARL)的算法方法,使机器人能够在存在对抗性跨机器人系统之间通信的情况下有效地协调其战略。 在我们的设置中,多机器人团队的目标是通过尽可能减少查找目标所需的平均时间,从而在对抗性环境下找到目标。我们假设机器人对目标地点没有事先了解,启动操作系统之间的操作能力或操作操作方法。他们只能利用一个移动性的标准-管理层的学习模式,在常规操作中,他们只能利用一个管理层的学习方式,在任何常规学习。