The offline reinforcement learning (RL) problem is often motivated by the need to learn data-driven decision policies in financial, legal and healthcare applications. However, the learned policy could retain sensitive information of individuals in the training data (e.g., treatment and outcome of patients), thus susceptible to various privacy risks. We design offline RL algorithms with differential privacy guarantees which provably prevent such risks. These algorithms also enjoy strong instance-dependent learning bounds under both tabular and linear Markov decision process (MDP) settings. Our theory and simulation suggest that the privacy guarantee comes at (almost) no drop in utility comparing to the non-private counterpart for a medium-size dataset.
翻译:离线强化学习(RL)问题往往是由于需要在金融、法律和保健应用中学习由数据驱动的决策政策,然而,所学的政策可以在培训数据中保留个人敏感信息(例如病人的治疗和结果),从而容易出现各种隐私风险。我们设计离线RL算法,有不同的隐私保障,可以明显地防止这种风险。这些算法在表格和线性Markov(MDP)决策程序(MDP)的设置下也都具有很强的以实例为依存的学习界限。我们的理论和模拟表明,与中等规模数据集的非私人对应方相比,隐私保障的效用(几乎)没有下降。