Multi-armed bandits (MAB) is a simple reinforcement learning model where the learner controls the trade-off between exploration versus exploitation to maximize its cumulative reward. Federated multi-armed bandits (FMAB) is a recently emerging framework where a cohort of learners with heterogeneous local models play a MAB game and communicate their aggregated feedback to a parameter server to learn the global feedback model. Federated learning models are vulnerable to adversarial attacks such as model-update attacks or data poisoning. In this work, we study an FMAB problem in the presence of Byzantine clients who can send false model updates that pose a threat to the learning process. We borrow tools from robust statistics and propose a median-of-means-based estimator: Fed-MoM-UCB, to cope with the Byzantine clients. We show that if the Byzantine clients constitute at most half the cohort, it is possible to incur a cumulative regret on the order of ${\cal O} (\log T)$ with respect to an unavoidable error margin, including the communication cost between the clients and the parameter server. We analyze the interplay between the algorithm parameters, unavoidable error margin, regret, communication cost, and the arms' suboptimality gaps. We demonstrate Fed-MoM-UCB's effectiveness against the baselines in the presence of Byzantine attacks via experiments.
翻译:多武装匪徒(MAB)是一个简单的强化学习模式,让学习者控制勘探与开发之间的权衡取舍,以最大限度地获得累积的奖励。多武装匪徒(FMAB)是一个最近出现的框架,在这个框架中,一群不同地方模型的学习者玩MAB游戏,将其综合反馈传递给参数服务器,以学习全球反馈模式。多武装匪徒(MAB)是一个简单的强化学习模式,很容易受到对抗性攻击,如最新模型袭击或数据中毒。在这项工作中,我们研究FMAB问题,因为Byzantine客户可以发送对学习进程构成威胁的虚假模型更新。我们从强有力的统计数据中借用工具,并提出基于手段的中位估测算器:Fed-M-M-UCB,以对付拜占庭客户。我们显示,如果Byzantine客户最多占全球反馈模式的一半,那么Byzantine客户就有可能对美元(\log T)的订单在不可避免的误差幅度上产生累积遗憾,包括客户与参数服务器之间的通信成本。我们通过逻辑参数参数模型分析模型参数参数参数参数参数参数参数参数参数参数参数参数参数参数参数参数参数参数参数参数参数参数参数参数参数参数参数参数参数参数,我们分析了各种相互作用。我们通过模型参数参数参数参数参数参数参数参数参数参数参数参数参数参数比差、误差、误差、误差、误差、误差、我们分析。我们分析了。我们分析模型差差差差,我们分析。我们分析了。我们用率、误差、误差、误差、错误差、错误差、遗憾、错误差、错误差、错误差、错误差、遗憾、比差、比差、比差、B。