Motivated by the increasing concern about privacy in nowadays data-intensive online learning systems, we consider a black-box optimization in the nonparametric Gaussian process setting with local differential privacy (LDP) guarantee. Specifically, the rewards from each user are further corrupted to protect privacy and the learner only has access to the corrupted rewards to minimize the regret. We first derive the regret lower bounds for any LDP mechanism and any learning algorithm. Then, we present three almost optimal algorithms based on the GP-UCB framework and Laplace DP mechanism. In this process, we also propose a new Bayesian optimization (BO) method (called MoMA-GP-UCB) based on median-of-means techniques and kernel approximations, which complements previous BO algorithms for heavy-tailed payoffs with a reduced complexity. Further, empirical comparisons of different algorithms on both synthetic and real-world datasets highlight the superior performance of MoMA-GP-UCB in both private and non-private scenarios.
翻译:出于对当今数据密集型在线学习系统中隐私的日益关切,我们认为,在非参数化高斯进程设置中,采用地方差异性隐私保障(LDP)保障,是一种黑箱优化,具体地说,每个用户的奖励进一步腐蚀,以保护隐私,学习者只能获得腐败的奖励,以尽量减少遗憾。我们首先对任何LDP机制和任何学习算法的下限感到遗憾。然后,我们根据GP-UCB框架和Laplace DP机制,提出三种几乎是最佳的算法。在这个过程中,我们还提议一种新的巴耶斯优化(BO)方法(称为MOMA-G-UB),其基础是中手段技术和核心近距离,这补充了以前BO关于以更复杂程度降低的重发报酬的算法。此外,对合成和真实世界数据集的不同算法进行的经验性比较突出了MMA-GP-UCB在私人和非私人情景中的优异性表现。