While significant research advances have been made in the field of deep reinforcement learning, there have been no concrete adversarial attack strategies in literature tailored for studying the vulnerability of deep reinforcement learning algorithms to membership inference attacks. In such attacking systems, the adversary targets the set of collected input data on which the deep reinforcement learning algorithm has been trained. To address this gap, we propose an adversarial attack framework designed for testing the vulnerability of a state-of-the-art deep reinforcement learning algorithm to a membership inference attack. In particular, we design a series of experiments to investigate the impact of temporal correlation, which naturally exists in reinforcement learning training data, on the probability of information leakage. Moreover, we compare the performance of \emph{collective} and \emph{individual} membership attacks against the deep reinforcement learning algorithm. Experimental results show that the proposed adversarial attack framework is surprisingly effective at inferring data with an accuracy exceeding $84\%$ in individual and $97\%$ in collective modes in three different continuous control Mujoco tasks, which raises serious privacy concerns in this regard. Finally, we show that the learning state of the reinforcement learning algorithm influences the level of privacy breaches significantly.
翻译:虽然在深层强化学习领域取得了重大进步,但在研究深层强化学习算法对成员推导攻击的脆弱性而专门设计的文献中没有具体的对抗性攻击战略。在这种攻击系统中,对手针对的是一套收集的输入数据,而这种数据是深层强化学习算法所培训的。为了弥补这一差距,我们提议了一个对抗性攻击框架,以测试最先进的深层强化学习算法对成员推导攻击的脆弱性。特别是,我们设计了一系列实验,以调查时间相关性的影响,这种影响自然存在于加强学习培训数据中,即信息渗漏的可能性。此外,我们比较了\emph{集体成员对深层强化学习算法的攻击的性能。实验结果表明,拟议的对抗性攻击框架令人惊讶地有效地推断出数据,精确度在个人中超过84 美元,集体模式中精确度超过97 美元,在三个不同的连续控制Mujoco任务中,这引起了严重的隐私问题。最后,我们表明,强化学习算法的学习状态对隐私侵犯程度有显著的影响。