Contextual bandit algorithms have been recently studied under the federated learning setting to satisfy the demand of keeping data decentralized and pushing the learning of bandit models to the client side. But limited by the required communication efficiency, existing solutions are restricted to linear models to exploit their closed-form solutions for parameter estimation. Such a restricted model choice greatly hampers these algorithms' practical utility. In this paper, we take the first step to addressing this challenge by studying generalized linear bandit models under a federated learning setting. We propose a communication-efficient solution framework that employs online regression for local update and offline regression for global update. We rigorously proved that, though the setting is more general and challenging, our algorithm can attain sub-linear rate in both regret and communication cost, which is also validated by our extensive empirical evaluations.
翻译:最近,在联合学习环境之下,对背景土匪算法进行了研究,以满足保持数据分散化的要求,并将土匪模型的学习推向客户方。但受必要的通信效率的限制,现有解决方案仅限于线性模型,以利用封闭式的参数估计解决方案。这种有限的模式选择极大地妨碍了这些算法的实际效用。在本文件中,我们采取第一个步骤来应对这一挑战,在联合学习环境中研究通用的线性土匪模型。我们提出了一个通信效率解决方案框架,利用在线回归法进行本地更新,并采用离线回归法进行全球更新。我们严格证明,尽管设置更为笼统且具有挑战性,但我们的算法可以在遗憾和通信成本两方面达到亚线性率,这也得到我们广泛经验评估的验证。