We study a novel multi-armed bandit (MAB) setting which mandates the agent to probe all the arms periodically in a non-stationary environment. In particular, we develop \texttt{TS-GE} that balances the regret guarantees of classical Thompson sampling (TS) with the broadcast probing (BP) of all the arms simultaneously in order to actively detect a change in the reward distributions. Once a system-level change is detected, the changed arm is identified by an optional subroutine called group exploration (GE) which scales as $\log_2(K)$ for a $K-$armed bandit setting. We characterize the probability of missed detection and the probability of false-alarm in terms of the environment parameters. The latency of change-detection is upper bounded by $\sqrt{T}$ while within a period of $\sqrt{T}$, all the arms are probed at least once. We highlight the conditions in which the regret guarantee of \texttt{TS-GE} outperforms that of the state-of-the-art algorithms, in particular, \texttt{ADSWITCH} and \texttt{M-UCB}. Furthermore, unlike the existing bandit algorithms, \texttt{TS-GE} can be deployed for applications such as timely status updates, critical control, and wireless energy transfer, which are essential features of next-generation wireless communication networks. We demonstrate the efficacy of \texttt{TS-GE} by employing it in a n industrial internet-of-things (IIoT) network designed for simultaneous wireless information and power transfer (SWIPT).
翻译:我们研究了一个新型的多武装土匪(MAB) 设置, 授权代理商在非静止环境中定期检测所有军火。 特别是, 我们开发了\ textt{ TS- GE}, 平衡古典汤普森取样( TS) 和所有军火的广播检测( BP) 的遗憾保证, 以便积极检测奖励分配的变化。 一旦检测到系统级别的变化, 改变的臂被一个叫作集团勘探( GE) 的可选子路标识别, 其规模为$_ 2( k) 美元, 用于一个美元- 美元- 武装土匪的设置。 我们用环境参数来描述检测失败的概率和假武器值的概率。 更改检测的宽度由$\ qrt{ TB} 同步, 而所有军火至少一次被检测过一次。 我们强调一个叫\ textt{ TTS- gE} 的遗憾保证, 它比下一个州- TGETF 、 关键的算算 、 特别是运行系统、 电路运 、 电路运 的能量- tral- treval- tral- treval- treval- treval- treval- translation) 、 、 、 等的能量- tral- translational- translational- tral- tral- translational- translational- translational- t 、 、 、 、 、 、 、 、 、 、 和 、 、 、 、 、 、 t- translation- t- t- t- t- t- t- t- t- t- t- t- t- t- t- t- t- t- t- t- t- t- t- t- t- t- t- t- t- t- t- t- t- t- tal- tal- tal- tal- t- t- t- t- t- t- t- tal- t- t- tal- t- t- t- t- t- t- t- t- tal- tal- tal- t- t- t- t- t- t- t