We study the problem of preserving privacy while still providing high utility in sequential decision making scenarios in a changing environment. We consider abruptly changing environment: the environment remains constant during periods and it changes at unknown time instants. To formulate this problem, we propose a variant of multi-armed bandits called non-stationary stochastic corrupt bandits. We construct an algorithm called SW-KLUCB-CF and prove an upper bound on its utility using the performance measure of regret. The proven regret upper bound for SW-KLUCB-CF is near-optimal in the number of time steps and matches the best known bound for analogous problems in terms of the number of time steps and the number of changes. Moreover, we present a provably optimal mechanism which can guarantee the desired level of local differential privacy while providing high utility.
翻译:我们研究保护隐私的问题,同时在不断变化的环境中在连续决策情景中仍然提供很高的效用。我们考虑突然变化的环境:环境在一段时间内保持不变,在未知的时间瞬间发生变化。为了解决这个问题,我们提出一个多武装匪徒的变种,称为非静止的随机腐败匪徒。我们建立了一个叫作SW-KLUCB-CF的算法,并用遗憾的性能衡量方法证明其效用的上限。事实证明,SW-KLUCB-CF的上限遗憾程度在时间步骤的数目上是接近最佳的,在时间步骤的数目和变化的数目上与已知的类似问题最接近的一致。此外,我们提出了一个可以保证当地差异隐私达到理想水平,同时提供高度效用的最佳机制。