Spectrum allocation in the form of primary channel and bandwidth selection is a key factor for dynamic channel bonding (DCB) wireless local area networks (WLANs). To cope with varying environments, where networks change their configurations on their own, the wireless community is looking towards solutions aided by machine learning (ML), and especially reinforcement learning (RL) given its trial-and-error approach. However, strong assumptions are normally made to let complex RL models converge to near-optimal solutions. Our goal with this paper is two-fold: justify in a comprehensible way why RL should be the approach for wireless networks problems like decentralized spectrum allocation, and call into question whether the use of complex RL algorithms helps the quest of rapid learning in realistic scenarios. We derive that stateless RL in the form of lightweight multi-armed-bandits (MABs) is an efficient solution for rapid adaptation avoiding the definition of extensive or meaningless RL states.
翻译:以初级频道和带宽选择形式提供的频谱分配是动态频道连接无线局域网的一个关键因素。为了应对网络自行改变配置的不同环境,无线社区正在寻找由机器学习(ML),特别是强化学习(RL)帮助的解决方案。然而,通常会作出强烈的假设,让复杂的RL模型聚集到接近最佳的解决方案中。我们与本文的目标有两个方面:以理解的方式说明为什么RL应该是解决分散频谱分配等无线网络问题的方法,并质疑使用复杂的RL算法是否有助于在现实情景下快速学习。我们发现,以轻量型多臂波段(MABs)为形式的无国籍RL是快速适应的有效解决方案,避免了广度或无意义的RL状态的定义。