Equipping drones with target search capabilities is desirable for applications in disaster management scenarios and smart warehouse delivery systems. Instead of deploying a single drone, an intelligent drone swarm that can collaborate with one another in maneuvering among obstacles will be more effective in accomplishing the target search in a shorter amount of time. In this work, we propose a data-efficient reinforcement learning-based approach, Adaptive Curriculum Embedded Multi-Stage Learning (ACEMSL), to address the challenges of carrying out a collaborative target search with a visual drone swarm, namely the 3D sparse reward space exploration and the collaborative behavior requirement. Specifically, we develop an adaptive embedded curriculum, where the task difficulty level can be adaptively adjusted according to the success rate achieved in training. Meanwhile, with multi-stage learning, ACEMSL allows data-efficient training and individual-team reward allocation for the collaborative drone swarm. The effectiveness and generalization capability of our approach are validated using simulations and actual flight tests.
翻译:在灾害管理情景和智能仓库交付系统中,为无人驾驶飞机配备目标搜索能力是可取的。智能无人驾驶飞机与其部署单一的无人驾驶飞机,不如部署一支智能无人驾驶飞机群,相互配合,在各种障碍之间进行操纵,这样在较短的时间内更能有效地完成目标搜索。在这项工作中,我们提出一种基于数据效率的强化学习方法,即适应性课程嵌入式多系统学习(ACEMSL),以应对与视觉无人驾驶飞机群进行协作目标搜索的挑战,即3D微弱的奖励空间探索和协作行为要求。具体地说,我们开发了一个适应性嵌入式课程,根据培训成功率对任务难度进行适应性调整。与此同时,随着多阶段学习,ACEMSL允许为合作无人机群进行数据效率和个人团队奖励分配。我们方法的有效性和普及能力通过模拟和实际飞行测试得到验证。