Unmanned Aerial Vehicle (UAV) swarms play an effective role in timely data collection from ground sensors in remote and hostile areas. Optimizing the collective behavior of swarms can improve data collection performance. This paper puts forth a new mean field flight resource allocation optimization to minimize age of information (AoI) of sensory data, where balancing the trade-off between the UAVs movements and AoI is formulated as a mean field game (MFG). The MFG optimization yields an expansive solution space encompassing continuous state and action, resulting in significant computational complexity. To address practical situations, we propose, a new mean field hybrid proximal policy optimization (MF-HPPO) scheme to minimize the average AoI by optimizing the UAV's trajectories and data collection scheduling of the ground sensors given mixed continuous and discrete actions. Furthermore, a long short term memory (LSTM) is leveraged in MF-HPPO to predict the time-varying network state and stabilize the training. Numerical results demonstrate that the proposed MF-HPPO reduces the average AoI by up to 45 percent and 57 percent in the considered simulation setting, as compared to multi-agent deep Q-learning (MADQN) method and non-learning random algorithm, respectively.
翻译:暂无翻译