Harvesting data from distributed Internet of Things (IoT) devices with multiple autonomous unmanned aerial vehicles (UAVs) is a challenging problem requiring flexible path planning methods. We propose a multi-agent reinforcement learning (MARL) approach that, in contrast to previous work, can adapt to profound changes in the scenario parameters defining the data harvesting mission, such as the number of deployed UAVs, number, position and data amount of IoT devices, or the maximum flying time, without the need to perform expensive recomputations or relearn control policies. We formulate the path planning problem for a cooperative, non-communicating, and homogeneous team of UAVs tasked with maximizing collected data from distributed IoT sensor nodes subject to flying time and collision avoidance constraints. The path planning problem is translated into a decentralized partially observable Markov decision process (Dec-POMDP), which we solve through a deep reinforcement learning (DRL) approach, approximating the optimal UAV control policy without prior knowledge of the challenging wireless channel characteristics in dense urban environments. By exploiting a combination of centered global and local map representations of the environment that are fed into convolutional layers of the agents, we show that our proposed network architecture enables the agents to cooperate effectively by carefully dividing the data collection task among themselves, adapt to large complex environments and state spaces, and make movement decisions that balance data collection goals, flight-time efficiency, and navigation constraints. Finally, learning a control policy that generalizes over the scenario parameter space enables us to analyze the influence of individual parameters on collection performance and provide some intuition about system-level benefits.
翻译:利用分布式自动无人驾驶飞行器(UAVs)收集分布式互联网(IOT)装置的数据是一个具有挑战性的问题,需要灵活的道路规划方法。我们建议采用多剂强化学习方法,与以往的工作相比,该方法可以适应确定数据采集任务的设想参数的深刻变化,例如部署的UAV的数量、IOT装置的数量、位置和数据数量,或最大飞行时间,而无需执行昂贵的再校或重复控制政策。我们为一个合作性、非交流性和同质的UAVs团队设计了路径规划问题,该团队的任务是最大限度地利用分布式IOT传感器节点收集的数据,以飞行和避免碰撞为限制条件。路径规划问题被转化成一个分散的可部分观察的Markov决定程序(Dec-POMDP),我们通过深层强化空间学习(DRL)方法来解决,在事先不了解城市密集环境中具有挑战性的无线通道特性的情况下,适应最佳UAV控制政策。我们利用全球和当地地图组合组合组合,利用全球和地方系统对分布式传感器的测算,将轨道测算系统测算结果,使我们收集的轨道的进度环境的进度,使我们的进度能够将数据收集系统化为收集大层次环境,使系统化,使我们的进度化,使数据收集系统化,使系统化,使数据收集系统本身的进度环境的进度化,使系统化,使系统化,使数据收集阶层化,使系统本身的进度化,使系统本身的进度化,使数据结构能使其使其使其适应于收集取的进度化,使我们进行。