Bandit algorithms have become a reference solution for interactive recommendation. However, as such algorithms directly interact with users for improved recommendations, serious privacy concerns have been raised regarding its practical use. In this work, we propose a differentially private linear contextual bandit algorithm, via a tree-based mechanism to add Laplace or Gaussian noise to model parameters. Our key insight is that as the model converges during online update, the global sensitivity of its parameters shrinks over time (thus named dynamic global sensitivity). Compared with existing solutions, our dynamic global sensitivity analysis allows us to inject less noise to obtain $(\epsilon, \delta)$-differential privacy with added regret caused by noise injection in $\tilde O(\log{T}\sqrt{T}/\epsilon)$. We provide a rigorous theoretical analysis over the amount of noise added via dynamic global sensitivity and the corresponding upper regret bound of our proposed algorithm. Experimental results on both synthetic and real-world datasets confirmed the algorithm's advantage against existing solutions.
翻译:土匪算法已成为互动式建议的一个参考解决方案。 但是,由于这种算法与用户直接互动以寻求改进建议,人们对隐私的实际使用提出了严重的关注。 在这项工作中,我们提议采用一种差别化的私人线性线性土匪算法,通过树基机制将Laplace或Gaussian噪音添加到模型参数中。我们的关键见解是,随着模型在网上更新过程中的聚合,其参数的全球敏感性随时间而缩小(因此被称为动态全球敏感性)。与现有的解决方案相比,我们动态的全球敏感性分析使我们能够注入较少的噪音,以获得$(\epsilon,\delta)$($-delta)的隐私,加上因以$\tilde O(\\log{sqrt{T}/\epsilon) 注入噪音而增加的遗憾。我们对通过动态全球敏感度增加的噪音的数量以及我们提议的算法的对应的上层遗憾捆绑,我们提供了严格的理论分析。 合成和真实世界数据集的实验结果证实了算法对于现有解决办法的优势。