学习机器人对复杂反对流环境的摇篮战术 (Learning Robot Swarm Tactics over Complex Adversarial Environments)

Amir Behjat,Hemanth Manjunatha,Prajit KrisshnaKumar,Apurv Jani,Leighton Collins,Payam Ghassemi,Joseph Distefano,David Doermann,Karthik Dantu,Ehsan Esfahani,Souma Chowdhury

from arxiv, Accepted to IEEE International Symposium on Multi-Robot and Multi-Agent Systems 2021

To accomplish complex swarm robotic missions in the real world, one needs to plan and execute a combination of single robot behaviors, group primitives such as task allocation, path planning, and formation control, and mission-specific objectives such as target search and group coverage. Most such missions are designed manually by teams of robotics experts. Recent work in automated approaches to learning swarm behavior has been limited to individual primitives with sparse work on learning complete missions. This paper presents a systematic approach to learn tactical mission-specific policies that compose primitives in a swarm to accomplish the mission efficiently using neural networks with special input and output encoding. To learn swarm tactics in an adversarial environment, we employ a combination of 1) map-to-graph abstraction, 2) input/output encoding via Pareto filtering of points of interest and clustering of robots, and 3) learning via neuroevolution and policy gradient approaches. We illustrate this combination as critical to providing tractable learning, especially given the computational cost of simulating swarm missions of this scale and complexity. Successful mission completion outcomes are demonstrated with up to 60 robots. In addition, a close match in the performance statistics in training and testing scenarios shows the potential generalizability of the proposed framework.

翻译：为了在现实世界中完成复杂的群温机器人任务,人们需要规划和实施单一机器人行为、任务分配、路径规划和形成控制等原始物群,以及目标搜索和群体覆盖等特定任务目标的组合组合,大多数这类任务都是由机器人专家团队手工设计的。最近为学习群温行为而采用自动化方法的工作仅限于个别原始人,在学习完整任务方面工作少之又少。本文件介绍了一种系统的方法,以学习战术性任务特定政策,这种政策构成原始人群群,以便利用具有特殊输入和输出编码的神经网络高效率地完成飞行任务。要学习对抗环境中的群温战术,我们采用以下组合:1)地图到绘图抽象,2)通过Pareto过滤兴趣点和机器人组合进行输入/输出编码,3)通过神经进化和政策梯度方法学习。我们用这种组合来说明这种组合对于提供可感动的学习至关重要,特别是考虑到模拟这种规模和复杂性的群度飞行任务的计算成本。成功完成任务的结果由60个机器人来演示。此外,还用一种近似可变性框架来显示总体性统计。