Maintaining the freshness of information in the Internet of Things (IoT) is a critical yet challenging problem. In this paper, we study cooperative data collection using multiple Unmanned Aerial Vehicles (UAVs) with the objective of minimizing the total average Age of Information (AoI). We consider various constraints of the UAVs, including kinematic, energy, trajectory, and collision avoidance, in order to optimize the data collection process. Specifically, each UAV, which has limited on-board energy, takes off from its initial location and flies over sensor nodes to collect update packets in cooperation with the other UAVs. The UAVs must land at their final destinations with non-negative residual energy after the specified time duration to ensure they have enough energy to complete their missions. It is crucial to design the trajectories of the UAVs and the transmission scheduling of the sensor nodes to enhance information freshness. We model the multi-UAV data collection problem as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP), as each UAV is unaware of the dynamics of the environment and can only observe a part of the sensors. To address the challenges of this problem, we propose a multi-agent Deep Reinforcement Learning (DRL)-based algorithm with centralized learning and decentralized execution. In addition to the reward shaping, we use action masks to filter out invalid actions and ensure that the constraints are met. Simulation results demonstrate that the proposed algorithms can significantly reduce the total average AoI compared to the baseline algorithms, and the use of the action mask method can improve the convergence speed of the proposed algorithm.
翻译:在互联网上保持信息新鲜度(IOT)是一个至关重要但具有挑战性的问题。在本文中,我们研究使用多个无人驾驶航空飞行器(UAVs)的合作数据收集工作,目的是最大限度地减少信息的平均年限(AOI)。我们考虑无人驾驶航空飞行器的各种制约因素,包括运动力、能量、轨迹和避免碰撞,以便优化数据收集过程。具体地说,每个在船上能量有限的无人驾驶飞行器,从最初的位置起飞,飞过传感器节点,以便与其他无人驾驶飞行器合作收集更新数据包。无人驾驶航空飞行器必须在指定时间之后在最终目的地降落,使用非负载剩余能量,以确保它们有足够的能量完成任务。我们考虑无人驾驶航空飞行器的各种制约因素,包括运动轨迹、能量、轨迹和避免碰撞,以便优化信息收集过程。我们把多无人驾驶飞行器数据收集问题建成一个分散式部分,因为每个无人驾驶飞行器都不了解在最后目的地使用非负载剩余剩余能量,因此只能通过升级的轨迹操作来大大降低对A-RDL的升级操作。</s>