Contextual bandit algorithms are useful in personalized online decision-making. However, many applications such as personalized medicine and online advertising require the utilization of individual-specific information for effective learning, while user's data should remain private from the server due to privacy concerns. This motivates the introduction of local differential privacy (LDP), a stringent notion in privacy, to contextual bandits. In this paper, we design LDP algorithms for stochastic generalized linear bandits to achieve the same regret bound as in non-privacy settings. Our main idea is to develop a stochastic gradient-based estimator and update mechanism to ensure LDP. We then exploit the flexibility of stochastic gradient descent (SGD), whose theoretical guarantee for bandit problems is rarely explored, in dealing with generalized linear bandits. We also develop an estimator and update mechanism based on Ordinary Least Square (OLS) for linear bandits. Finally, we conduct experiments with both simulation and real-world datasets to demonstrate the consistently superb performance of our algorithms under LDP constraints with reasonably small parameters $(\varepsilon, \delta)$ to ensure strong privacy protection.
翻译:然而,许多应用软件,如个性化药物和在线广告,都需要利用个人特有信息来进行有效学习,而用户的数据由于隐私问题,应该从服务器中保持隐私。这促使对背景强盗采用隐私的严格概念,即地方差异隐私(LDP),这是隐私的严格概念。在本文中,我们为随机通用的线性大盗设计LDP算法,以达到与非隐私环境中的非隐私环境一样的遗憾。我们的主要想法是开发一个基于梯度的随机测深测深器和更新机制,以确保LDP。然后,我们利用对梯度梯度血统的灵活性(SGD),在处理一般线性强盗时,很少探讨对土匪的理论保障。我们还开发了一个基于普通最低广场(OLS)的线性土匪测算器和更新机制。最后,我们用模拟和真实世界的数据集进行实验,以展示我们根据LDP限制的相当小的参数(Varepsilon,\delta)始终超额的算法表现。