We study distributed contextual linear bandits with stochastic contexts, where $N$ agents act cooperatively to solve a linear bandit-optimization problem with $d$-dimensional features over the course of $T$ rounds. For this problem, we derive the first ever information-theoretic lower bound $\Omega(dN)$ on the communication cost of any algorithm that performs optimally in a regret minimization setup. We then propose a distributed batch elimination version of the LinUCB algorithm, DisBE-LUCB, where the agents share information among each other through a central server. We prove that the communication cost of DisBE-LUCB matches our lower bound up to logarithmic factors. In particular, for scenarios with known context distribution, the communication cost of DisBE-LUCB is only $\tilde{\mathcal{O}}(dN)$ and its regret is ${\tilde{\mathcal{O}}}(\sqrt{dNT})$, which is of the same order as that incurred by an optimal single-agent algorithm for $NT$ rounds. We also provide similar bounds for practical settings where the context distribution can only be estimated. Therefore, our proposed algorithm is nearly minimax optimal in terms of \emph{both regret and communication cost}. Finally, we propose DecBE-LUCB, a fully decentralized version of DisBE-LUCB, which operates without a central server, where agents share information with their \emph{immediate neighbors} through a carefully designed consensus procedure.
翻译:我们研究的是背景线性土匪,其中以美元为单位,在T美元回合中合作解决线性土匪操作问题。对于这个问题,我们从任何在最遗憾的最小化设置中最佳表现的算法的通信成本中,首次得出信息理论较低约束$Omega(dN)$(dN)$。然后我们提出一个分批清除版本LinUCB算法DisBE-LUCB(DisBE-LUCB),代理商通过中央服务器彼此分享信息。我们证明DisBE-LUCB的通信成本与我们较低约束的对数因素相匹配。对于已知的环境分布,DisBE-LUCB的通信成本仅为$trd*(dNTB),而它的通信成本是美元分流式的分流式版本。我们用最优的单级的单级算算法,我们用最精细的版本来估算了最精细的汇率。