Communication bottleneck and data privacy are two critical concerns in federated multi-armed bandit (MAB) problems, such as situations in decision-making and recommendations of connected vehicles via wireless. In this paper, we design the privacy-preserving communication-efficient algorithm in such problems and study the interactions among privacy, communication and learning performance in terms of the regret. To be specific, we design privacy-preserving learning algorithms and communication protocols and derive the learning regret when networked private agents are performing online bandit learning in a master-worker, a decentralized and a hybrid structure. Our bandit learning algorithms are based on epoch-wise sub-optimal arm eliminations at each agent and agents exchange learning knowledge with the server/each other at the end of each epoch. Furthermore, we adopt the differential privacy (DP) approach to protect the data privacy at each agent when exchanging information; and we curtail communication costs by making less frequent communications with fewer agents participation. By analyzing the regret of our proposed algorithmic framework in the master-worker, decentralized and hybrid structures, we theoretically show tradeoffs between regret and communication costs/privacy. Finally, we empirically show these trade-offs which are consistent with our theoretical analysis.
翻译:通信瓶颈和数据隐私是联合多武装土匪(MAB)问题的两个关键问题,如通过无线接通车辆的决策和建议等情况。在本文中,我们设计了在这些问题中保护隐私的通信效率算法,从遗憾的角度研究隐私、通信和学习表现之间的相互作用。具体地说,我们设计了保护隐私的学习算法和通信协议,当网络化私人代理人在一个主工、分散化和混合结构中进行在线土匪学习时,就会产生学习的遗憾。我们土匪学习算法的基础是每个代理和代理人在每一个小区末端与服务器/相互交换学习知识的偏差亚最佳手臂。此外,我们采用差异性隐私(DP)办法,在交流信息时保护每个代理人的数据隐私;我们减少通信频率,减少代理人的参与,从而减少通信成本。通过分析我们在主工、分散化和混合结构中拟议的算法框架的遗憾,我们理论上显示在后悔与通信成本/原始分析中相互权衡,我们的经验性地展示了这些理论性分析。