Q-Learning方法下基于无人机群组的障碍环境路径规划系统 (Q-learning Based System for Path Planning with UAV Swarms in Obstacle Environments)

Path Planning methods for autonomous control of Unmanned Aerial Vehicle (UAV) swarms are on the rise because of all the advantages they bring. There are more and more scenarios where autonomous control of multiple UAVs is required. Most of these scenarios present a large number of obstacles, such as power lines or trees. If all UAVs can be operated autonomously, personnel expenses can be decreased. In addition, if their flight paths are optimal, energy consumption is reduced. This ensures that more battery time is left for other operations. In this paper, a Reinforcement Learning based system is proposed for solving this problem in environments with obstacles by making use of Q-Learning. This method allows a model, in this particular case an Artificial Neural Network, to self-adjust by learning from its mistakes and achievements. Regardless of the size of the map or the number of UAVs in the swarm, the goal of these paths is to ensure complete coverage of an area with fixed obstacles for tasks, like field prospecting. Setting goals or having any prior information aside from the provided map is not required. For experimentation, five maps of different sizes with different obstacles were used. The experiments were performed with different number of UAVs. For the calculation of the results, the number of actions taken by all UAVs to complete the task in each experiment is taken into account. The lower the number of actions, the shorter the path and the lower the energy consumption. The results are satisfactory, showing that the system obtains solutions in fewer movements the more UAVs there are. For a better presentation, these results have been compared to another state-of-the-art approach.

翻译：针对在带有障碍物的环境中对无人机群控制进行自主控制的路径规划方法越来越受欢迎。由于存在大量障碍物，如电线或树木，在大多数场景中，需要自主控制多个无人机。如果所有无人机都能够自主操作，则可以减少人员费用。此外，如果其飞行路径是最优的，则可以减少能量消耗。这确保将更多的电池时间留给其他操作。本文提出了一种基于强化学习的系统，通过利用Q-Learning，解决了在带有障碍物的环境中的路径规划问题。此方法允许自主调整模型，特别是人工神经网络，通过从其错误和成就中学习。无论地图的大小或无人机群中无人机的数量如何，这些路径的目标是确保任务（例如田野勘探）中的固定障碍区域完全覆盖。除了提供的地图外，不需要设定目标或具有任何先前信息。在实验中，使用了五个大小不同的地图和带有不同障碍物的地图。试验使用不同数量的无人机进行。计算结果时，考虑受到所有无人机采取的操作数以在每个实验中完成任务。行动次数越少，路径越短，能量消耗越低。结果令人满意，表明系统获得更少的运动次数的解决方案，无人机越多。为了更好地展示这些结果，将这些结果与另一种最先进的方法进行了比较。