Reinforcement learning (RL) algorithms can be used to provide personalized services, which rely on users' private and sensitive data. To protect the users' privacy, privacy-preserving RL algorithms are in demand. In this paper, we study RL with linear function approximation and local differential privacy (LDP) guarantees. We propose a novel $(\varepsilon, \delta)$-LDP algorithm for learning a class of Markov decision processes (MDPs) dubbed linear mixture MDPs, and obtains an $\tilde{\mathcal{O}}( d^{5/4}H^{7/4}T^{3/4}\left(\log(1/\delta)\right)^{1/4}\sqrt{1/\varepsilon})$ regret, where $d$ is the dimension of feature mapping, $H$ is the length of the planning horizon, and $T$ is the number of interactions with the environment. We also prove a lower bound $\Omega(dH\sqrt{T}/\left(e^{\varepsilon}(e^{\varepsilon}-1)\right))$ for learning linear mixture MDPs under $\varepsilon$-LDP constraint. Experiments on synthetic datasets verify the effectiveness of our algorithm. To the best of our knowledge, this is the first provable privacy-preserving RL algorithm with linear function approximation.
翻译:强化学习( RL) 算法可以用来提供个性化服务,这种算法依赖于用户的私人和敏感数据。 为了保护用户的隐私, 需要隐私保存 RL 算法。 在本文中, 我们用线性函数近似和本地差异性隐私( LDP) 保障来研究 RL 。 我们提议了一个新的 $( varepsilon,\ delta) $- LDP 算法, 用于学习一组 Markov 决策程序( MDPs), 称为线性混合物 MDP, 并获得 $\ t\ mathcal{ O{ ( d\ 5/4} H\ 7/4} T\ 3/4\\\ left (\ log (\\\\\\\\\ delta\\\\\ right)\\\\\ right)\\\\ sqrqr\ $( \ varepsilall) lax 美元, lax salliversalliversalliflexalalalal liflex) 数据 数据。 在Mvalislislislislislislislislislalxxxxxxxxxxx 数据,, 。 。 在Mvalislevlislislisl