在共享光谱中进行适应性中度进入和调整的分布式深强化学习 (Distributed Deep Reinforcement Learning for Adaptive Medium Access and Modulation in Shared Spectrum)

Spectrum scarcity has led to growth in the use of unlicensed spectrum for cellular systems. This motivates intelligent adaptive approaches to spectrum access for both WiFi and 5G that improve upon traditional carrier sensing and listen-before-talk methods. We study decentralized contention-based medium access for base stations (BSs) of a single Radio Access Technology (RAT) operating on unlicensed shared spectrum. We devise a learning-based algorithm for both contention and adaptive modulation that attempts to maximize a network-wide downlink throughput objective. We formulate and develop novel distributed implementations of two deep reinforcement learning approaches - Deep Q Networks and Proximal Policy Optimization - modelled on a two stage Markov decision process. Empirically, we find the (proportional fairness) reward accumulated by the policy gradient approach to be significantly higher than even a genie-aided adaptive energy detection threshold. Our approaches are further validated by improved sum and peak throughput. The scalability of our approach to large networks is demonstrated via an improved cumulative reward earned on both indoor and outdoor layouts with a large number of BSs.

翻译：光谱稀缺导致对蜂窝系统的无许可证频谱的使用增加。这促使对无证光谱的使用采取智能适应性办法,改善传统的载体感测和监听前对话方法。我们研究了以无许可证共享光谱运行的单一无线电存取技术的基础站(BS)基于争议的中位接入。我们为争议和适应性调控设计了一种基于学习的算法,以尽量扩大整个网络的下链路吞吐量目标。我们制定并开发了两种深度强化学习方法----深Q网络和普罗克西米亚政策优化----的新的分散实施方法,这些方法以两个阶段Markov决策程序为模型。我们偶然地发现,政策梯度方法积累的(相称性)奖赏远远高于甚至由基因辅助的适应性能源探测阈值。我们的方法进一步得到改进的总量和峰值的验证。我们对大型网络的做法的可扩展性是通过大量BS的室内和室外布局获得更好的累积奖赏来证明的。