The increasing number of wireless devices operating in unlicensed spectrum motivates the development of intelligent adaptive approaches to spectrum access that go beyond traditional carrier sensing. We develop a novel distributed implementation of a policy gradient method known as Proximal Policy Optimization modelled on a two stage Markov decision process that enables such an intelligent approach, and still achieves decentralized contention-based medium access. In each time slot, a base station (BS) uses information from spectrum sensing and reception quality to autonomously decide whether or not to transmit on a given resource, with the goal of maximizing proportional fairness network-wide. Empirically, we find the proportional fairness reward accumulated by the policy gradient approach to be significantly higher than even a genie-aided adaptive energy detection threshold. This is further validated by the improved sum and maximum user throughputs achieved by our approach.
翻译:在无许可证频谱中运作的无线装置数量不断增加,这促使人们制定超越传统载体感测的智能适应性方法,以开发超越传统载体感测的频谱存取方式。我们开发了一种新型的分散化的政策梯度方法,称为Proximal Policy Applyimation(Proximal Policy Policy Proximation),该方法以两个阶段的Markov 决策程序为模型,使这种智能方法得以采用,并仍然实现基于争议的分散式中位接入。在每一个时段,一个基地台(BS)使用从频谱感测和接收质量等信息,自主地决定是否传输特定资源,目的是最大限度地实现整个比例公平网络。我们经常发现,政策梯度方法积累的按比例公平性奖励远远高于甚至由基因辅助的适应性能源检测阈值。我们的方法所实现的改进的总和最大用户量进一步证实了这一点。