Contextual multi-armed bandit (MAB) is an important sequential decision-making problem in recommendation systems. A line of works, called the clustering of bandits (CLUB), utilize the collaborative effect over users and dramatically improve the recommendation quality. Owing to the increasing application scale and public concerns about privacy, there is a growing demand to keep user data decentralized and push bandit learning to the local server side. Existing CLUB algorithms, however, are designed under the centralized setting where data are available at a central server. We focus on studying the federated online clustering of bandit (FCLUB) problem, which aims to minimize the total regret while satisfying privacy and communication considerations. We design a new phase-based scheme for cluster detection and a novel asynchronous communication protocol for cooperative bandit learning for this problem. To protect users' privacy, previous differential privacy (DP) definitions are not very suitable, and we propose a new DP notion that acts on the user cluster level. We provide rigorous proofs to show that our algorithm simultaneously achieves (clustered) DP, sublinear communication complexity and sublinear regret. Finally, experimental evaluations show our superior performance compared with benchmark algorithms.
翻译:多武装土匪(MAB)是建议系统中一个重要的顺序决策问题。一行工作,称为土匪群集(CLUB),利用对用户的协作效应,大幅提高建议质量。由于应用规模的扩大和公众对隐私的担忧,越来越需要保持用户数据分散化,并将土匪的学习推到本地服务器方面。但现有的CLUB算法是在中央服务器数据可用集中设置下设计的。我们侧重于研究土匪团团团(FCLUB)的在线联合问题,目的是在满足隐私和通信考虑的同时最大限度地减少全面的遗憾。我们设计了一个新的基于阶段的集束探测计划,为合作土匪队学习这一问题设计了一个新的非同步通信协议。为了保护用户的隐私,以前的差异隐私(DP)定义并不十分合适,我们提出了一个新的DP概念,即用户群集层一级。我们提供了严格的证据,以证明我们的算法同时实现了(集束式)DP、亚线通信复杂性和子线性遗憾。最后,实验性评价显示了我们与基准算法相比的优劣性。