There is a rapid increase in the cooperative learning paradigm in online learning settings, i.e., federated learning (FL). Unlike most FL settings, there are many situations where the agents are competitive. Each agent would like to learn from others, but the part of the information it shares for others to learn from could be sensitive; thus, it desires its privacy. This work investigates a group of agents working concurrently to solve similar combinatorial bandit problems while maintaining quality constraints. Can these agents collectively learn while keeping their sensitive information confidential by employing differential privacy? We observe that communicating can reduce the regret. However, differential privacy techniques for protecting sensitive information makes the data noisy and may deteriorate than help to improve regret. Hence, we note that it is essential to decide when to communicate and what shared data to learn to strike a functional balance between regret and privacy. For such a federated combinatorial MAB setting, we propose a Privacy-preserving Federated Combinatorial Bandit algorithm, P-FCB. We illustrate the efficacy of P-FCB through simulations. We further show that our algorithm provides an improvement in terms of regret while upholding quality threshold and meaningful privacy guarantees.
翻译:在线学习环境中的合作学习模式迅速增加, 即联合学习( FL) 。 与大多数 FL 环境不同的是, 合作学习模式迅速增加。 与大多数 FL 环境不同, 许多情况下代理商都具有竞争力。 每个代理商都希望向他人学习, 但共享的信息部分可能会敏感; 因此, 它希望自己的隐私 。 这项工作调查一组同时解决类似组合强盗问题的代理商, 同时又保持质量限制 。 这些代理商能否在使用不同隐私的情况下集体学习, 保持敏感信息保密? 我们观察到沟通可以减少遗憾 。 然而, 保护敏感信息的隐私技术差异使得数据变得吵闹, 恶化可能比帮助改善遗憾 。 因此, 我们指出, 关键是要决定何时沟通, 以及共享哪些数据, 才能在遗憾和隐私之间取得功能上的平衡。 对于这种联合组合组合组合式的 MAB 设置, 我们建议采用一个保存隐私的联邦组合强盗乐队的运算法, P- FCB 。 我们通过模拟来说明P- FCC 的功效。 我们进一步表明我们的算法在维护质量门槛和切实的保证中提高了遗憾。