We demonstrate the possibility of learning drone swarm controllers that are zero-shot transferable to real quadrotors via large-scale multi-agent end-to-end reinforcement learning. We train policies parameterized by neural networks that are capable of controlling individual drones in a swarm in a fully decentralized manner. Our policies, trained in simulated environments with realistic quadrotor physics, demonstrate advanced flocking behaviors, perform aggressive maneuvers in tight formations while avoiding collisions with each other, break and re-establish formations to avoid collisions with moving obstacles, and efficiently coordinate in pursuit-evasion tasks. We analyze, in simulation, how different model architectures and parameters of the training regime influence the final performance of neural swarms. We demonstrate the successful deployment of the model learned in simulation to highly resource-constrained physical quadrotors performing stationkeeping and goal swapping behaviors. Code and video demonstrations are available at the project website https://sites.google.com/view/swarm-rl.
翻译:我们展示了通过大规模多试剂端到端强化学习,将零发性可转让给真正的四重体的无人机群控控制器进行学习的可能性。我们培训了由神经网络参数化的政策,这些神经网络能够以完全分散的方式控制单个无人机群。我们的政策在模拟环境中经过培训,具有现实的四重体物理学,展示了先进的群集行为,在紧凑的编队中进行进攻性演习,同时避免相互碰撞,打破和重新建立编队以避免与移动障碍发生碰撞,并有效协调追逐和躲避任务。我们在模拟中分析了培训制度的不同模型结构和参数如何影响神经温情的最后性能。我们展示了模拟中学到的模型的成功运用,以高度受资源制约的物理四重体模型进行站控和目标对换行为。项目网站 https://sites.google.com/view/swarm-rl提供了代码和视频演示。