联邦多武装强盗 (Federated Multi-Armed Bandits)

Federated multi-armed bandits (FMAB) is a new bandit paradigm that parallels the federated learning (FL) framework in supervised learning. It is inspired by practical applications in cognitive radio and recommender systems, and enjoys features that are analogous to FL. This paper proposes a general framework of FMAB and then studies two specific federated bandit models. We first study the approximate model where the heterogeneous local models are random realizations of the global model from an unknown distribution. This model introduces a new uncertainty of client sampling, as the global model may not be reliably learned even if the finite local models are perfectly known. Furthermore, this uncertainty cannot be quantified a priori without knowledge of the suboptimality gap. We solve the approximate model by proposing Federated Double UCB (Fed2-UCB), which constructs a novel "double UCB" principle accounting for uncertainties from both arm and client sampling. We show that gradually admitting new clients is critical in achieving an O(log(T)) regret while explicitly considering the communication cost. The exact model, where the global bandit model is the exact average of heterogeneous local models, is then studied as a special case. We show that, somewhat surprisingly, the order-optimal regret can be achieved independent of the number of clients with a careful choice of the update periodicity. Experiments using both synthetic and real-world datasets corroborate the theoretical analysis and demonstrate the effectiveness and efficiency of the proposed algorithms.

翻译：多武装的联邦土匪(FMAB)是一种新的土匪模式,它与联邦学习框架在监督学习过程中的联结学习(FL)框架平行,在认知无线电和建议系统的实际应用中受到认知性无线电和建议系统的实际应用的启发,具有类似于FL的特征。本文件提出FMAB的总体框架,然后研究两个具体的联邦土匪模式。我们首先研究混合的地方模型是从未知分布中随机实现全球模型的近似模型。这个模型引入了客户抽样的新不确定性,因为即使有限的地方模型完全为人所知,全球模型也可能无法可靠地学习。此外,这种不确定性无法在不了解亚最佳性差距的情况下先验性地量化。我们通过提出联邦双倍UCB(Fed2-UCB)的总体框架来解决大致模式,而后又研究两种特定的联结的土匪模式模式模式。我们首先研究混合的地方模型是随机地实现O(log(T)遗憾,同时明确考虑通信成本。精确的模型,全球土匪模式是本地模型的精确平均数。然后研究如何将本地模型作为本地模型的精确的本地模型,然后研究,然后作为特别的样本,我们研究。我们用实验性数据周期来分析,然后用一个特别的例子,我们展示的方法来展示,我们展示了真实地展示了真实性的数据效率。我们展示,我们展示了真实性地展示了真实性地展示了真实性数据的效率。我们所实现。我们所实现的实验性的数据。我们展示了一种实验性的数据。