This paper studies the problem of distributed spectrum/channel access for cognitive radio-enabled unmanned aerial vehicles (CUAVs) that overlay upon primary channels. Under the framework of cooperative spectrum sensing and opportunistic transmission, a one-shot optimization problem for channel allocation, aiming to maximize the expected cumulative weighted reward of multiple CUAVs, is formulated. To handle the uncertainty due to the lack of prior knowledge about the primary user activities as well as the lack of the channel-access coordinator, the original problem is cast into a competition and cooperation hybrid multi-agent reinforcement learning (CCH-MARL) problem in the framework of Markov game (MG). Then, a value-iteration-based RL algorithm, which features upper confidence bound-Hoeffding (UCB-H) strategy searching, is proposed by treating each CUAV as an independent learner (IL). To address the curse of dimensionality, the UCB-H strategy is further extended with a double deep Q-network (DDQN). Numerical simulations show that the proposed algorithms are able to efficiently converge to stable strategies, and significantly improve the network performance when compared with the benchmark algorithms such as the vanilla Q-learning and DDQN algorithms.
翻译:本文研究了在初级频道上重叠的认知式无线电辅助无人驾驶飞行器(CUAVs)的分布式频谱/通道接入问题。在合作式频谱感和机会性传输的框架内,制定了频道分配的一次性优化问题,目的是最大限度地实现对多个CUAV的预期累积加权奖励。为了处理由于事先不了解主要用户活动以及缺乏频道接入协调员而造成的不确定性,最初的问题被推到了马尔科夫游戏(MG)框架内的多试剂强化混合学习(CCH-MARL)的竞争与合作问题中。然后,提出了一种基于价值的RL算法,其特点是将每个CUBAVAV作为独立的学习者(IL)来对待,目的是解决对维度的诅咒,UCB-H战略进一步扩展,同时采用了双深的Q-网络(DQN)。 Numical模拟表明,拟议的算法能够有效地与稳定的战略趋同,在与基准性算法和DDQ等基准学习时,大大改进网络的绩效。