认知式无人驾驶航空器网络中合作观测遥感和通道访问多机构加强学习 (Multi-Agent Reinforcement Learning for Joint Cooperative Spectrum Sensing and Channel Access in Cognitive UAV Networks)

This paper studies the problem of distributed spectrum/channel access for cognitive radio-enabled unmanned aerial vehicles (CUAVs) that overlay upon primary channels. Under the framework of cooperative spectrum sensing and opportunistic transmission, a one-shot optimization problem for channel allocation, aiming to maximize the expected cumulative weighted reward of multiple CUAVs, is formulated. To handle the uncertainty due to the lack of prior knowledge about the primary user activities as well as the lack of the channel-access coordinator, the original problem is cast into a competition and cooperation hybrid multi-agent reinforcement learning (CCH-MARL) problem in the framework of Markov game (MG). Then, a value-iteration-based RL algorithm, which features upper confidence bound-Hoeffding (UCB-H) strategy searching, is proposed by treating each CUAV as an independent learner (IL). To address the curse of dimensionality, the UCB-H strategy is further extended with a double deep Q-network (DDQN). Numerical simulations show that the proposed algorithms are able to efficiently converge to stable strategies, and significantly improve the network performance when compared with the benchmark algorithms such as the vanilla Q-learning and DDQN algorithms.

翻译：本文研究了在初级频道上重叠的认知式无线电辅助无人驾驶飞行器(CUAVs)的分布式频谱/通道接入问题。在合作式频谱感和机会性传输的框架内,制定了频道分配的一次性优化问题,目的是最大限度地实现对多个CUAV的预期累积加权奖励。为了处理由于事先不了解主要用户活动以及缺乏频道接入协调员而造成的不确定性,最初的问题被推到了马尔科夫游戏(MG)框架内的多试剂强化混合学习(CCH-MARL)的竞争与合作问题中。然后,提出了一种基于价值的RL算法,其特点是将每个CUBAVAV作为独立的学习者(IL)来对待,目的是解决对维度的诅咒,UCB-H战略进一步扩展,同时采用了双深的Q-网络(DQN)。 Numical模拟表明,拟议的算法能够有效地与稳定的战略趋同,在与基准性算法和DDQ等基准学习时,大大改进网络的绩效。

相关内容

Cognition

关注 4

Cognition：Cognition：International Journal of Cognitive Science Explanation：认知：国际认知科学杂志。 Publisher：Elsevier。 SIT： http://www.journals.elsevier.com/cognition/

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

85+阅读 · 2020年2月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日