In this study, a contextual multi-armed bandit (CMAB)-based decentralized channel exploration framework disentangling a channel utility function (i.e., reward) with respect to contending neighboring access points (APs) is proposed. The proposed framework enables APs to evaluate observed rewards compositionally for contending APs, allowing both robustness against reward fluctuation due to neighboring APs' varying channels and assessment of even unexplored channels. To realize this framework, we propose contention-driven feature extraction (CDFE), which extracts the adjacency relation among APs under contention and forms the basis for expressing reward functions in the disentangled form, that is, a linear combination of parameters associated with neighboring APs under contention). This allows the CMAB to be leveraged with joint a linear upper confidence bound (JLinUCB) exploration and to delve into the effectiveness of the proposed framework. Moreover, we address the problem of non-convergence -- the channel exploration cycle -- by proposing a penalized JLinUCB (P-JLinUCB) based on the key idea of introducing a discount parameter to the reward for exploiting a different channel before and after the learning round. Numerical evaluations confirm that the proposed method allows APs to assess the channel quality robustly against reward fluctuations by CDFE and achieves better convergence properties by P-JLinUCB.
翻译:在这项研究中,提出了一个基于多武装的分散式河道勘探框架(CMAB),其背景是多武装强盗(CMAB)的分散式河道勘探框架,在竞相近邻接入点上,分离频道公用事业功能(即奖励),即将相邻接入点相关参数的线性组合作为基础。拟议框架使APs能够以共同的线性上信任(JLinUCB)探险,并探讨拟议框架的有效性。为了实现这一框架,我们提议采用争议驱动特征提取(CDFE),在争议中各行动方案之间产生对等关系,形成以混杂形式表达奖赏功能(即奖赏)的基础,即将与相邻接入点相关参数的线性组合作为基础。这使得CMAB能够利用联合线性最高信任(JLinUCB)的探索,防止因相邻的奖励波动问题 -- -- 频道勘探周期 -- -- 通过提出惩罚性的JLICB(P-JLINCB),其基础是提出一个关键想法,即引入与相交错的奖得价性评价,然后通过强化的NFEB,通过学习强化的周期评估,通过强化的奖励,使NUCFEA-CR(B)获得更好的评分,从而获得更好的奖励。