We consider the problem of dynamic spectrum access (DSA) in cognitive wireless networks, where only partial observations are available to the users due to narrowband sensing and transmissions. The cognitive network consists of primary users (PUs) and a secondary user (SU), which operate in a time duplexing regime. The traffic pattern for each PU is assumed to be unknown to the SU and is modeled as a finite-memory Markov chain. Since observations are partial, then both channel sensing and access actions affect the throughput. The objective is to maximize the SU's long-term throughput. To achieve this goal, we develop a novel algorithm that learns both access and sensing policies via deep Q-learning, dubbed Double Deep Q-network for Sensing and Access (DDQSA). To the best of our knowledge, this is the first paper that solves both sensing and access policies for DSA via deep Q-learning. Second, we analyze the optimal policy theoretically to validate the performance of DDQSA. Although the general DSA problem is P-SPACE hard, we derive the optimal policy explicitly for a common model of a cyclic user dynamics. Our results show that DDQSA learns a policy that implements both sensing and channel access, and significantly outperforms existing approaches.
翻译:我们考虑的是认知无线网络中的动态频谱存取(DSA)问题,因为由于窄带感应和传输,用户只能得到部分观测,因此在认知无线网络中只有部分观测。认知网络由主要用户(PU)和二级用户(SU)组成,在时间翻转制度下运作。每个PU的交通模式被假定为SU不认识,并被建为有限的Memory Markov链条。由于观测是局部的,然后是频道感应和存取行动都影响到通过量。目标是最大限度地提高SU的长期吞吐量。为了实现这一目标,我们开发了一种新型的算法,既通过深Q学习,又通过双深深的感应和存取网络(DDQSA)来学习访问和感知政策。这是我们最先进的政策模式,通过深入的Q-学习,我们从现有用户动态中学习了一种通用的ADDM 和快速存取方法。