The growing number of applications of Reinforcement Learning (RL) in real-world domains has led to the development of privacy-preserving techniques due to the inherently sensitive nature of data. Most existing works focus on differential privacy, in which information is revealed in the clear to an agent whose learned model should be robust against information leakage to malicious third parties. Motivated by use cases in which only encrypted data might be shared, such as information from sensitive sites, in this work we consider scenarios in which the inputs themselves are sensitive and cannot be revealed. We develop a simple extension to the MDP framework which provides for the encryption of states. We present a preliminary, experimental study of how a DQN agent trained on encrypted states performs in environments with discrete and continuous state spaces. Our results highlight that the agent is still capable of learning in small state spaces even in presence of non-deterministic encryption, but performance collapses in more complex environments.
翻译:由于数据具有内在敏感性,在现实世界领域应用强化学习(RL)的次数越来越多,因此开发了隐私保护技术,因为数据具有内在敏感性。大多数现有工作都侧重于差异隐私。在这种隐私方面,信息被明确透露给一个机构,其学习模式应该能够有力地防止信息泄漏到恶意第三方。我们受只能共享加密数据(例如敏感地点的信息)的利用案例的驱使,在这项工作中,我们考虑到投入本身敏感而无法披露的情景。我们开发了一个简单的MDP框架的扩展,该框架规定了国家的加密。我们提出了一个初步的实验性研究,研究在加密状态上培训的DQN代理如何在离散和连续状态空间的环境中进行。我们的结果突出表明,即使存在非定时加密,但性能在更复杂的环境中崩溃,该代理仍然能够在小的州空间里学习。