Unmanned aerial vehicle (UAV)-assisted data collection has been emerging as a prominent application due to its flexibility, mobility, and low operational cost. However, under the dynamic and uncertainty of IoT data collection and energy replenishment processes, optimizing the performance for UAV collectors is a very challenging task. Thus, this paper introduces a novel framework that jointly optimizes the flying speed and energy replenishment for each UAV to significantly improve the data collection performance. Specifically, we first develop a Markov decision process to help the UAV automatically and dynamically make optimal decisions under the dynamics and uncertainties of the environment. We then propose a highly-effective reinforcement learning algorithm leveraging deep Q-learning, double deep Q-learning, and a deep dueling neural network architecture to quickly obtain the UAV's optimal policy. The core ideas of this algorithm are to estimate the state values and action advantages separately and simultaneously and to employ double estimators for estimating the action values. Thus, these proposed techniques can stabilize the learning process and effectively address the overestimation problem of conventional Q-learning algorithms. To further reduce the learning time as well as significantly improve learning quality, we develop advanced transfer learning techniques to allow UAVs to ``share'' and ``transfer'' learning knowledge. Extensive simulations demonstrate that our proposed solution can improve the average data collection performance of the system up to 200% compared with those of current methods.
翻译:无人驾驶航空飞行器(UAV)辅助数据收集由于其灵活性、流动性和低运作成本,已成为一项突出应用。然而,在IOT数据收集和能源补充进程的动态和不确定性下,优化无人驾驶航空飞行器收集器的性能是一项极具挑战性的任务。因此,本文件引入了一个新的框架,共同优化每个无人驾驶航空飞行器的飞行速度和能源补充,以显著改善数据收集工作。具体地说,我们首先开发了Markov决定程序,以帮助无人驾驶飞行器在环境动态和不确定性下自动和动态地做出最佳决定。然后,我们提出了一种高效的强化学习算法,利用深度的Q学习、双重深度的Q学习和深度的神经网络结构,以迅速获得UAV的最佳政策。这一算法的核心思想是分别和同时估计状态值和行动优势,并采用双重估计来评估行动价值。因此,这些拟议的技术可以稳定学习过程,并有效地解决常规Q学习算法的200个过高问题。我们进一步缩短学习时间,同时大大改进了目前学习质量、双重深层次的神经网络结构,我们开发了高级的学习技能,从而展示了学习平均学习方法。