This paper presents a novel federated linear contextual bandits model, where individual clients face different $K$-armed stochastic bandits coupled through common global parameters. By leveraging the geometric structure of the linear rewards, a collaborative algorithm called Fed-PE is proposed to cope with the heterogeneity across clients without exchanging local feature vectors or raw data. Fed-PE relies on a novel multi-client G-optimal design, and achieves near-optimal regrets for both disjoint and shared parameter cases with logarithmic communication costs. In addition, a new concept called collinearly-dependent policies is introduced, based on which a tight minimax regret lower bound for the disjoint parameter case is derived. Experiments demonstrate the effectiveness of the proposed algorithms on both synthetic and real-world datasets.
翻译:本文展示了一个新的联盟线性背景强盗模型,在这个模型中,个体客户面临不同的KK美元武装随机强盗,并通过共同的全球参数来面对不同的全球参数。 通过利用线性奖赏的几何结构,一个名为Fed-PE的协作算法(Fed-PE)被提出来应对客户之间的异质性,而不必交换本地地物矢量或原始数据。 Fed-PE依靠一种新型的多客户G-最佳设计,并实现几乎最佳的遗憾,因为使用对数通信成本的脱节和共享参数案例。此外,还引入了一个新的概念 — — 共线性政策 — — 在此基础上,得出了紧凑的迷你牛对脱节参数案例的下限。 实验显示了合成和真实世界数据集的拟议算法的有效性。