Designing clustered unmanned aerial vehicle (UAV) communication networks based on cognitive radio (CR) and reinforcement learning can significantly improve the intelligence level of clustered UAV communication networks and the robustness of the system in a time-varying environment. Among them, designing smarter systems for spectrum sensing and access is a key research issue in CR. Therefore, we focus on the dynamic cooperative spectrum sensing and channel access in clustered cognitive UAV (CUAV) communication networks. Due to the lack of prior statistical information on the primary user (PU) channel occupancy state, we propose to use multi-agent reinforcement learning (MARL) to model CUAV spectrum competition and cooperative decision-making problem in this dynamic scenario, and a return function based on the weighted compound of sensing-transmission cost and utility is introduced to characterize the real-time rewards of multi-agent game. On this basis, a time slot multi-round revisit exhaustive search algorithm based on virtual controller (VC-EXH), a Q-learning algorithm based on independent learner (IL-Q) and a deep Q-learning algorithm based on independent learner (IL-DQN) are respectively proposed. Further, the information exchange overhead, execution complexity and convergence of the three algorithms are briefly analyzed. Through the numerical simulation analysis, all three algorithms can converge quickly, significantly improve system performance and increase the utilization of idle spectrum resources.
翻译:设计基于认知无线电和强化学习的集束无人驾驶飞行器通信网络,可以大大提高集束无人驾驶飞行器通信网络的智能水平,提高系统在时间变化环境中的稳健性,其中包括设计更聪明的频谱遥感和接入系统,这是捷克共和国的一个关键研究问题。因此,我们侧重于在集束的认知无人驾驶飞行器通信网络中进行动态合作频谱遥感和频道访问。由于缺乏关于主要用户(PU)频道占用状态的先前统计信息,我们提议使用多试剂强化学习(MARL),以模拟集束无人驾驶飞行器频谱频谱网络的竞赛和合作决策问题,并采用基于遥感传输成本和实用性加权复合组合的回报功能,作为多试游戏实时奖励的特点。在此基础上,我们侧重于基于虚拟控制器(VC-EXH)、基于独立学习者(IL-QQ)的宽广搜索搜索算法,以及基于独立学习者(IL-DQN)的深度QL学习算法。我们提议在这种动态情景下,根据遥感传输传输-传输成本和效用的加权分析分别是快速分析。