With the arrival of next generation wireless communication, a growing number of new applications like internet of things, autonomous driving systems, and drone are crowding the unlicensed spectrum. Licensed network such as the long-term evolution (LTE) also comes to the unlicensed spectrum for better providing high-capacity contents with low cost. However, LTE was not designed to share resources with others. Previous solutions usually work on fixed scenarios. This work features a Nonparametric Bayesian reinforcement learning algorithm to cope with the coexistence between Wi-Fi and LTE licensed assisted access (LTE-LAA) agents in 5 GHz unlicensed spectrum. The coexistence problem is modeled as a decentralized partially-observable Markov decision process (Dec-POMDP) and Bayesian inference is adopted for policy learning with nonparametric prior to accommodate the uncertainty of policy for different agents. A fairness measure is introduced in the reward function to encourage fair sharing between agents. Variational inference for posterior model approximation is considered to make the algorithm computationally efficient. Simulation results demonstrate that this algorithm can reach high value with compact policy representations in few learning iterations.
翻译:随着下一代无线通信的到来,越来越多的新应用程序,如东西的互联网、自主驾驶系统和无人驾驶飞机等,正在挤压无许可证频谱。长期演进(LTE)等有许可证的网络也来到无许可证的频谱,以便更好地以低成本提供高容量内容。然而,LTE并不是设计来与他人共享资源的。先前的解决方案通常在固定情景上发挥作用。这项工作的特点是非对称贝氏强化学习算法,以应对Wi-Fi和LTE获得许可的辅助访问(LTE-LAAA)代理在5 GHz无许可证频谱中的共存。共存问题以分散式的半可观测的Markov决策过程(Dec-POMDP)为模型,而Bayesian推论则被采用,用于在适应不同代理方政策的不确定性之前进行非对称的政策学习。在奖励功能中引入了一种公平措施,以鼓励代理商之间公平共享。认为后方模型近似错误的推论使算法具有计算效率。模拟结果表明,这一算法在少数方能学习精细的政策表述中具有高价值。