In this paper, we study the problem of regret minimization in reinforcement learning (RL) under differential privacy constraints. This work is motivated by the wide range of RL applications for providing personalized service, where privacy concerns are becoming paramount. In contrast to previous works, we take the first step towards non-tabular RL settings, while providing a rigorous privacy guarantee. In particular, we consider the adaptive control of differentially private linear quadratic (LQ) systems. We develop the first private RL algorithm, PRL, which is able to attain a sub-linear regret while guaranteeing privacy protection. More importantly, the additional cost due to privacy is only on the order of $\frac{\ln(1/\delta)^{1/4}}{\epsilon^{1/2}}$ given privacy parameters $\epsilon, \delta > 0$. Through this process, we also provide a general procedure for adaptive control of LQ systems under changing regularizers, which not only generalizes previous non-private controls, but also serves as the basis for general private controls.
翻译:在本文中,我们研究了在不同的隐私限制下在强化学习(RL)中减少遗憾的问题;这项工作的动机是提供个性化服务的多种RL应用程序,其中隐私问题正在变得至关重要;与以前的工作相比,我们迈出了第一步,在提供严格的隐私保障的同时,向非薄性RL设置提供了严格的隐私保障;特别是,我们考虑了对有区别的私人线性线性二次曲线系统(LQ)的适应性控制;我们开发了第一个私人RL算法(PRL),该算法能够在保障隐私保护的同时实现子线性遗憾;更重要的是,由于隐私而增加的费用仅按$frac=ln(1/delta)\ ⁇ 1/4 ⁇ epsilon ⁇ 1/2 ⁇ $的顺序来计算,而给隐私参数为$\epsilon,\delta > 0$。我们通过这一过程,还为在改变的规范制度下对LQ系统的适应性控制提供了一般程序,这不仅概括了以前的非私人控制,而且还作为一般私人控制的基础。