We study a decentralized channel allocation problem in an ad-hoc Internet of Things network underlaying on the spectrum licensed to a primary cellular network. In the considered network, the impoverished channel sensing/probing capability and computational resource on the IoT devices make them difficult to acquire the detailed Channel State Information (CSI) for the shared multiple channels. In practice, the unknown patterns of the primary users' transmission activities and the time-varying CSI (e.g., due to small-scale fading or device mobility) also cause stochastic changes in the channel quality. Decentralized IoT links are thus expected to learn channel conditions online based on partial observations, while acquiring no information about the channels that they are not operating on. They also have to reach an efficient, collision-free solution of channel allocation with limited coordination. Our study maps this problem into a contextual multi-player, multi-armed bandit game, and proposes a purely decentralized, three-stage policy learning algorithm through trial-and-error. Theoretical analyses shows that the proposed scheme guarantees the IoT links to jointly converge to the social optimal channel allocation with a sub-linear (i.e., polylogarithmic) regret with respect to the operational time. Simulations demonstrate that it strikes a good balance between efficiency and network scalability when compared with the other state-of-the-art decentralized bandit algorithms.
翻译:我们研究在临时互联网“物”网络中分散的频道分配问题,在特许初级蜂窝网络的频谱上,我们研究了分散的“物”网络分布问题。在经过考虑的网络中,贫困的频道遥感/检测能力和IOT设备上的计算资源使得他们难以为共享的多个频道获得详细的频道国家信息(CSI),实际上,主要用户传输活动和时间变化的CSI(例如,由于小规模的衰减或装置移动)的未知模式也会导致频道质量的改变。因此,分散的IOT链接预计将在部分观察的基础上在线学习频道条件,而没有获得关于它们不运行的频道的信息。它们还必须在有限的协调下找到高效的、无碰撞的频道分配解决方案。我们的研究将这一问题描绘成一个背景的多功能多功能、多臂的波段游戏,并提议一个纯粹分散的、三阶段的政策学习算法(例如,由于小规模的衰减或装置的移动) 。理论分析表明,拟议的计划保证IOT链接在部分观察的基础上,与社会最佳频道配置配置的频道配置同时,与一个小段段比的网络比较的可操作性,同时展示。