With the development of the 5G and Internet of Things, amounts of wireless devices need to share the limited spectrum resources. Dynamic spectrum access (DSA) is a promising paradigm to remedy the problem of inefficient spectrum utilization brought upon by the historical command-and-control approach to spectrum allocation. In this paper, we investigate the distributed DSA problem for multi-user in a typical multi-channel cognitive radio network. The problem is formulated as a decentralized partially observable Markov decision process (Dec-POMDP), and we proposed a centralized off-line training and distributed on-line execution framework based on cooperative multi-agent reinforcement learning (MARL). We employ the deep recurrent Q-network (DRQN) to address the partial observability of the state for each cognitive user. The ultimate goal is to learn a cooperative strategy which maximizes the sum throughput of cognitive radio network in distributed fashion without coordination information exchange between cognitive users. Finally, we validate the proposed algorithm in various settings through extensive experiments. From the simulation results, we can observe that the proposed algorithm can converge fast and achieve almost the optimal performance.
翻译:动态频谱存取(DSA)是一个很有希望的范例,可以解决由于对频谱分配采取历史指挥和控制办法而带来的低效率频谱利用问题。在本文件中,我们调查了典型的多频道认知无线电网络中多用户分散的DSA问题。这个问题是一个分散的、部分可见的Markov决策程序(Dec-POMDP),我们建议基于多剂强化合作学习(MARL)建立集中的离线培训和在线执行框架。我们采用深层次的经常性Q-网络(DRQN)来解决每个认知用户部分可观测性的问题。最终目标是学习一项合作战略,在不协调认知用户之间交流信息的情况下,通过分布式的认知无线电网络的计算,最大限度地增加数量。最后,我们通过广泛的实验来验证各种环境中的拟议算法。我们从模拟结果中可以观察到,拟议的算法可以快速集中,并几乎实现最佳的绩效。