While significant research advances have been made in the field of deep reinforcement learning, a major challenge to widespread industrial adoption of deep reinforcement learning that has recently surfaced but little explored is the potential vulnerability to privacy breaches. In particular, there have been no concrete adversarial attack strategies in literature tailored for studying the vulnerability of deep reinforcement learning algorithms to membership inference attacks. To address this gap, we propose an adversarial attack framework tailored for testing the vulnerability of deep reinforcement learning algorithms to membership inference attacks. More specifically, we design a series of experiments to investigate the impact of temporal correlation, which naturally exists in reinforcement learning training data, on the probability of information leakage. Furthermore, we study the differences in the performance of \emph{collective} and \emph{individual} membership attacks against deep reinforcement learning algorithms. Experimental results show that the proposed adversarial attack framework is surprisingly effective at inferring the data used during deep reinforcement training with an accuracy exceeding $84\%$ in individual and $97\%$ in collective mode on two different control tasks in OpenAI Gym, which raises serious privacy concerns in the deployment of models resulting from deep reinforcement learning. Moreover, we show that the learning state of a reinforcement learning algorithm significantly influences the level of the privacy breach.
翻译:虽然在深加学习领域取得了显著的研究进步,但工业广泛采用最近浮现但很少探讨的深加学习的一大挑战是隐私侵犯的潜在脆弱性;特别是,在研究深加学习算法的脆弱性的文献中,没有为研究深加学习算法的脆弱性而专门设计的具体对抗性攻击战略,以弥补这一差距,我们建议一个专门用来测试深加学习算法对加入推断攻击的脆弱性的对抗性攻击框架;更具体地说,我们设计了一系列实验,以调查时间相关性的影响,这种相关性自然存在于加强学习培训数据中,即信息泄漏的可能性。此外,我们研究在对深度强化学习算法的会员攻击方面的不同表现。实验结果表明,拟议的对抗性攻击框架在深入强化训练期间所使用的数据的精确度在个人身上超过84 美元,集体模式为97 美元。在OpenAI Gym的两项不同的控制任务中,这在部署深度强化隐私影响模型时引起了严重的隐私问题。此外,我们学习如何加强隐私影响。