Path planning methods for autonomous unmanned aerial vehicles (UAVs) are typically designed for one specific type of mission. In this work, we present a method for autonomous UAV path planning based on deep reinforcement learning (DRL) that can be applied to a wide range of mission scenarios. Specifically, we compare coverage path planning (CPP), where the UAV's goal is to survey an area of interest to data harvesting (DH), where the UAV collects data from distributed Internet of Things (IoT) sensor devices. By exploiting structured map information of the environment, we train double deep Q-networks (DDQNs) with identical architectures on both distinctly different mission scenarios, to make movement decisions that balance the respective mission goal with navigation constraints. By introducing a novel approach exploiting a compressed global map of the environment combined with a cropped but uncompressed local map showing the vicinity of the UAV agent, we demonstrate that the proposed method can efficiently scale to large environments. We also extend previous results for generalizing control policies that require no retraining when scenario parameters change and offer a detailed analysis of crucial map processing parameters' effects on path planning performance.
翻译:自主无人驾驶飞行器(无人驾驶飞行器)的路径规划方法一般是为特定类型的飞行任务设计的。在这项工作中,我们提出了一个基于深重强化学习(DRL)的自主无人驾驶飞行器路径规划方法,可以应用于广泛的飞行任务情景。具体地说,我们比较了覆盖路径规划(CPP),无人驾驶飞行器的目标是勘测数据采集感兴趣的领域(DH),无人驾驶飞行器从分布式物(IoT)传感器的互联网上收集数据。通过利用环境结构化的地图信息,我们培训了在两种截然不同的任务情景上都具有相同结构的双深Q网络(DDQNs),以便做出平衡各自任务目标和导航限制的移动决定。我们采用了一种新颖的方法,利用压缩的全球环境地图,同时绘制显示无人驾驶飞行器附近位置的本地地图,我们证明拟议的方法可以有效地推广到大环境。我们还将以往在情景参数变化时不需要再培训的通用控制政策的结果推广到对路径规划绩效的关键地图处理参数进行详细分析。