The increasing number of wireless devices operating in unlicensed spectrum motivates the development of intelligent adaptive approaches to spectrum access. We consider decentralized contention-based medium access for base stations (BSs) operating on unlicensed shared spectrum, where each BS autonomously decides whether or not to transmit on a given resource. The contention decision attempts to maximize not its own downlink throughput, but rather a network-wide objective. We formulate this problem as a decentralized partially observable Markov decision process with a novel reward structure that provides long term proportional fairness in terms of throughput. We then introduce a two-stage Markov decision process in each time slot that uses information from spectrum sensing and reception quality to make a medium access decision. Finally, we incorporate these features into a distributed reinforcement learning framework for contention-based spectrum access. Our formulation provides decentralized inference, online adaptability and also caters to partial observability of the environment through recurrent Q-learning. Empirically, we find its maximization of the proportional fairness metric to be competitive with a genie-aided adaptive energy detection threshold, while being robust to channel fading and small contention windows.
翻译:越来越多的无线装置在无许可证的频谱中运作,这促使对频谱接入采取明智的适应性办法。我们考虑在无许可证的共享频谱上运行的基础站采用分散的基于争议的媒体访问,每个BS自主决定是否传输特定资源。争论决定试图不最大限度地扩大其自身的下行链路输送量,而是网络范围的目标。我们将此问题描述为一个分散的、部分可观测的Markov决策程序,其新的奖励结构在吞吐量方面提供长期的相称性公平性。然后,我们在每个时段引入一个两阶段的Markov决策程序,利用频谱感和接收质量的信息作出中继决定。最后,我们将这些特征纳入基于争议频谱访问的分布强化学习框架。我们的表述提供了分散的推论、在线适应性,并通过经常性的Q学习来满足环境部分可耐性。我们沉着地发现,其相称性公平度衡量标准的最大程度与基因辅助的适应性能源检测阈值是竞争性的,同时对频道退缩和小型争议窗口是强大的。