We study distributed contextual linear bandits with stochastic contexts, where $N$ agents act cooperatively to solve a linear bandit-optimization problem with $d$-dimensional features. For this problem, we propose a distributed batch elimination version of the LinUCB algorithm, DisBE-LUCB, where the agents share information among each other through a central server. We prove that over $T$ rounds ($NT$ actions in total) the communication cost of DisBE-LUCB is only $\tilde{\mathcal{O}}(dN)$ and its regret is at most $\tilde{\mathcal{O}}(\sqrt{dNT})$, which is of the same order as that incurred by an optimal single-agent algorithm for $NT$ rounds. Remarkably, we derive an information-theoretic lower bound on the communication cost of the distributed contextual linear bandit problem with stochastic contexts, and prove that our proposed algorithm is nearly minimax optimal in terms of \emph{both regret and communication cost}. Finally, we propose DecBE-LUCB, a fully decentralized version of DisBE-LUCB, which operates without a central server, where agents share information with their \emph{immediate neighbors} through a carefully designed consensus procedure.
翻译:我们研究的是背景分布线性强盗,其背景是分流的线性强盗,其中,美元代理商合作解决线性土匪-优化问题,使用美元维特特点。对于这个问题,我们建议采用分批的分批消除版LinUCB算法DisBE-LUCB,该代理商通过中央服务器彼此共享信息。我们证明,DisBE-LUCB的通信成本超过T美元(总共美元),DisBE-LUCB的通信成本只有$\tilde=mathcal{O{O{(dN)$(dN)美元,其遗憾是最多为$\tilde_mathcal{O{{{{(sqrt{{{dNT}}$$DisB 和最佳单一代理算法的分批版本相同。值得注意的是,我们从传播的线性线性线性问题通信成本中得出了一个较低的信息理论约束, 并且证明我们提议的算法在\emboth reflex reflead rimedial a dis-deal develop develop develop develop develop develop development the the dis- dis-dealsirmissortistrital developmental