联邦土匪:一种采用政府做法的方法 (Federated Bandit: A Gossiping Approach)

In this paper, we study \emph{Federated Bandit}, a decentralized Multi-Armed Bandit problem with a set of $N$ agents, who can only communicate their local data with neighbors described by a connected graph $G$. Each agent makes a sequence of decisions on selecting an arm from $M$ candidates, yet they only have access to local and potentially biased feedback/evaluation of the true reward for each action taken. Learning only locally will lead agents to sub-optimal actions while converging to a no-regret strategy requires a collection of distributed data. Motivated by the proposal of federated learning, we aim for a solution with which agents will never share their local observations with a central entity, and will be allowed to only share a private copy of his/her own information with their neighbors. We first propose a decentralized bandit algorithm Gossip_UCB, which is a coupling of variants of both the classical gossiping algorithm and the celebrated Upper Confidence Bound (UCB) bandit algorithm. We show that Gossip_UCB successfully adapts local bandit learning into a global gossiping process for sharing information among connected agents, and achieves guaranteed regret at the order of $O(\max\{ \texttt{poly}(N,M) \log T, \texttt{poly}(N,M)\log_{\lambda_2^{-1}} N\})$ for all $N$ agents, where $\lambda_2\in(0,1)$ is the second largest eigenvalue of the expected gossip matrix, which is a function of $G$. We then propose Fed_UCB, a differentially private version of Gossip_UCB, in which the agents preserve $\epsilon$-differential privacy of their local data while achieving $O(\max \{\frac{\texttt{poly}(N,M)}{\epsilon}\log^{2.5} T, \texttt{poly}(N,M) (\log_{\lambda_2^{-1}} N + \log T) \})$ regret.

翻译：本文中, 我们研究的是 emph{ 联邦土匪 }, 多武装土匪问题, 由一组美元代理商组成, 他们只能与邻居交流本地数据, 以连接图形 $ G$ 。每个代理商都会从候选人中做出一系列关于从$M 选择手臂的决定, 但是他们只能获得本地的且潜在有偏差的反馈/ 对每次行动的真正奖赏的反馈/ 评估。只有本地学习才能导致代理商的亚最佳行动, 而要融合到一个无报复战略, 需要收集已分发的数据。根据联合学习的建议, 我们的目标是找到一个解决方案, 代理商永远不会与一个中央实体分享本地观测结果, 并且只允许与邻居分享他/ 她自己信息的私人副本。我们首先提出一个分散的土匪算算法 Gosimal Gosip_, 这是经典流言运的第二个版本和高信任( UCB) 所有的基算算算。我们显示, 将Gosip_ 成功调整本地土匪, 将Oal deal deal 。