We demonstrate the possibility of learning drone swarm controllers that are zero-shot transferable to real quadrotors via large-scale multi-agent end-to-end reinforcement learning. We train policies parameterized by neural networks that are capable of controlling individual drones in a swarm in a fully decentralized manner. Our policies, trained in simulated environments with realistic quadrotor physics, demonstrate advanced flocking behaviors, perform aggressive maneuvers in tight formations while avoiding collisions with each other, break and re-establish formations to avoid collisions with moving obstacles, and efficiently coordinate in pursuit-evasion tasks. We analyze, in simulation, how different model architectures and parameters of the training regime influence the final performance of neural swarms. We demonstrate the successful deployment of the model learned in simulation to highly resource-constrained physical quadrotors performing station keeping and goal swapping behaviors. Code and video demonstrations are available on the project website at https://sites.google.com/view/swarm-rl.
翻译:我们展示了通过大规模多试剂端到端强化学习,将零发可转让给真正的四甲型动物的无人机群控制器进行学习的可能性。我们培训了由神经网络参数化的政策,这些神经网络能够以完全分散的方式控制单个无人机群。我们的政策在模拟环境中经过培训,具有现实的四甲型动物物理学,展示了先进的群集行为,在紧凑的编队中进行进攻性演习,同时避免相互碰撞,打破和重新建立编队以避免与移动障碍发生碰撞,并有效协调追逐和躲避任务。我们在模拟中分析了培训制度的不同模型结构和参数如何影响神经温情的最后性能。我们展示了模拟中学到的模型的成功运用,以便向资源高度紧张的物理四甲型动物运行站进行保持和目标对换行为。在项目网站https://sites.google.com/view/swarm-rl上提供了代码和视频演示。