Next-generation wireless services are characterized by a diverse set of requirements, to sustain which, the wireless access points need to probe the users in the network periodically. In this regard, we study a novel multi-armed bandit (MAB) setting that mandates probing all the arms periodically while keeping track of the best current arm in a non-stationary environment. In particular, we develop \texttt{TS-GE} that balances the regret guarantees of classical Thompson sampling (TS) with the broadcast probing (BP) of all the arms simultaneously in order to actively detect a change in the reward distributions. The main innovation in the algorithm is in identifying the changed arm by an optional subroutine called group exploration (GE) that scales as $\log_2(K)$ for a $K-$armed bandit setting. We characterize the probability of missed detection and the probability of false-alarm in terms of the environment parameters. We highlight the conditions in which the regret guarantee of \texttt{TS-GE} outperforms that of the state-of-the-art algorithms, in particular, \texttt{ADSWITCH} and \texttt{M-UCB}. We demonstrate the efficacy of \texttt{TS-GE} by employing it in two wireless system application - task offloading in mobile-edge computing (MEC) and an industrial internet-of-things (IIoT) network designed for simultaneous wireless information and power transfer (SWIPT).
翻译:下一代无线服务的特征是一系列不同的要求,为了维持这些要求,无线接入点需要定期对网络用户进行检测。 在这方面,我们研究一个新的多武装匪帮(MAB)设置新颖的多武装匪帮(MAB),规定定期对所有军火进行检测,同时在非静止环境中跟踪最佳电臂。特别是,我们开发了\textt{TS-GE},以平衡古典汤普森取样(TS)的遗憾保证与所有军火的广播检测(BP)之间的平衡,以便积极检测奖励分布的变异。 算法的主要创新是用一个称为“GE”的可选子例(GE)来测量所有军火的定期检测,同时在非静止环境中跟踪最佳电臂。我们用环境参数来描述错测的可能性和假电臂的概率。 我们突出了传统汤普采样(Textt{TS-GEG)的遗憾保证优于最新电量运算法的精确度分布。 特别是,Streal-TS-TS-troal-treal-tal-traction_T-TLU_TU_S-S-T-T_T_TLULU_S-T_S-T-T-T-T-S-S-TH} 和TUTUTUDS-S-S-S-S-T-T-S-TUBS-S-S-S-S-S-S-TUT-T-T-T-S-S-S-S-TLUTLTLT-S-S-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-TL-TL-TL-TL-S-TL-TL-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-