Reinforcement learning algorithms are widely used in domains where it is desirable to provide a personalized service. In these domains it is common that user data contains sensitive information that needs to be protected from third parties. Motivated by this, we study privacy in the context of finite-horizon Markov Decision Processes (MDPs) by requiring information to be obfuscated on the user side. We formulate this notion of privacy for RL by leveraging the local differential privacy (LDP) framework. We present an optimistic algorithm that simultaneously satisfies LDP requirements, and achieves sublinear regret. We also establish a lower bound for regret minimization in finite-horizon MDPs with LDP guarantees. These results show that while LDP is appealing in practical applications, the setting is inherently more complex. In particular, our results demonstrate that the cost of privacy is multiplicative when compared to non-private settings.
翻译:强化学习算法被广泛用于需要提供个性化服务的领域。在这些领域里,用户数据通常含有需要第三方保护的敏感信息。受此驱动,我们通过要求用户方面混淆信息来研究有限和偏松Markov决定程序(MDPs)的隐私。我们通过利用当地差异性隐私框架来为RL制定这一隐私概念。我们提出了一个既满足LDP要求又实现亚线性悔的乐观算法。我们还在有限和偏松的 MDPs中设定了一个较低限度的最小遗憾最小化的限度。这些结果显示,虽然LDP在实际应用中具有吸引力,但环境本身就更为复杂。特别是,我们的结果表明,与非私人环境相比,隐私的成本是倍增的。