We consider distributed linear bandits where $M$ agents learn collaboratively to minimize the overall cumulative regret incurred by all agents. Information exchange is facilitated by a central server, and both the uplink and downlink communications are carried over channels with fixed capacity, which limits the amount of information that can be transmitted in each use of the channels. We investigate the regret-communication trade-off by (i) establishing information-theoretic lower bounds on the required communications (in terms of bits) for achieving a sublinear regret order; (ii) developing an efficient algorithm that achieves the minimum sublinear regret order offered by centralized learning using the minimum order of communications dictated by the information-theoretic lower bounds. For sparse linear bandits, we show a variant of the proposed algorithm offers better regret-communication trade-off by leveraging the sparsity of the problem.
翻译:我们认为,分布式线性土匪是分布式的线性土匪,用美元代理商合作学习,以尽量减少所有代理商产生的累积遗憾;中央服务器为信息交流提供便利,上链和下链通信都通过固定容量的渠道传递,这限制了每个渠道使用中能够传递的信息数量;我们调查了令人遗憾的通信交易,方法是:(一) 确定所需通信(按位数计算)的信息理论下限,以便实现亚线性遗憾命令;(二) 开发一种高效算法,通过利用信息理论较低界限要求的最低通信顺序集中学习,实现最低次线性遗憾命令;对于稀少的线性土匪,我们展示了拟议算法的变式,通过利用问题多发性来更好地进行遗憾交易。