通过分散的多试剂Q-学习,对无人驾驶飞行器基地台站进行有能源意识的优化 (Energy-aware placement optimization of UAV base stations via decentralized multi-agent Q-learning)

Unmanned aerial vehicles serving as aerial base stations (UAV-BSs) can be deployed to provide wireless connectivity to ground devices in events of increased network demand, points-of-failure in existing infrastructure, or disasters. However, it is challenging to conserve the energy of UAVs during prolonged coverage tasks, considering their limited on-board battery capacity. Reinforcement learning-based (RL) approaches have been previously used to improve energy utilization of multiple UAVs, however, a central cloud controller is assumed to have complete knowledge of the end-devices' locations, i.e., the controller periodically scans and sends updates for UAV decision-making. This assumption is impractical in dynamic network environments with mobile ground devices. To address this problem, we propose a decentralized Q-learning approach, where each UAV-BS is equipped with an autonomous agent that maximizes the connectivity to ground devices while improving its energy utilization. Experimental results show that the proposed design significantly outperforms the centralized approaches in jointly maximizing the number of connected ground devices and the energy utilization of the UAV-BSs.

翻译：在网络需求增加、现有基础设施出现故障或发生灾害时,可部署无人驾驶飞行器作为空基站(UAV-BS),为地面装置提供无线连接;然而,考虑到无人驾驶飞行器在机载电池容量有限的情况下,在长时间的覆盖任务中保护无人驾驶飞行器的能源具有挑战性;过去曾采用强化学习(RL)方法来改进多架无人驾驶飞行器的能源利用;然而,假设中央云控制器完全了解最终装置的位置,即控制器定期扫描和为无人驾驶飞行器决策发送最新消息,这种假设在动态网络环境中使用移动地面装置是不切实际的;为解决这一问题,我们提议采用分散式的Q学习方法,即每个无人驾驶飞行器配备一个自主的动力,最大限度地扩大与地面装置的连通性,同时改进其能源利用;实验结果表明,拟议的设计大大偏离了联合实现地面连接装置数量和无人驾驶飞行器能源利用的中央方法。