Kickstarting deep reinforcement learning algorithms facilitate a teacher-student relationship among the agents and allow for a well-performing teacher to share demonstrations with a student to expedite the student's training. However, despite the known benefits, the demonstrations may contain sensitive information about the teacher's training data and existing kickstarting methods do not take any measures to protect it. Therefore, we use the framework of differential privacy to develop a mechanism that securely shares the teacher's demonstrations with the student. The mechanism allows for the teacher to decide upon the accuracy of its demonstrations with respect to the privacy budget that it consumes, thereby granting the teacher full control over its data privacy. We then develop a kickstarted deep reinforcement learning algorithm for the student that is privacy-aware because we calibrate its objective with the parameters of the teacher's privacy mechanism. The privacy-aware design of the algorithm makes it possible to kickstart the student's learning despite the perturbations induced by the privacy mechanism. From numerical experiments, we highlight three empirical results: (i) the algorithm succeeds in expediting the student's learning, (ii) the student converges to a performance level that was not possible without the demonstrations, and (iii) the student maintains its enhanced performance even after the teacher stops sharing useful demonstrations due to its privacy budget constraints.
翻译:启动深层强化学习算法,可以促进代理人之间的师生关系,并使优秀教师能够与学生分享示范活动,以加快学生培训。然而,尽管已知的好处,示范活动可能包含有关教师培训数据的敏感信息,而现有的启动方法没有采取任何措施来保护这些数据。因此,我们利用差异隐私权框架开发一个机制,安全地与学生分享教师的示威活动。这个机制允许教师就其所消费的隐私预算的示威准确性作出决定,从而给予教师充分控制其数据隐私。然后,我们为意识到隐私的学生开发了一个启动的深层强化学习算法,因为我们根据教师隐私机制的参数调整了其目标,而现有的启动方法并没有采取任何措施来保护这些数据。我们利用差异隐私权框架开发了一个机制,可以启动学生与学生的学习,尽管隐私机制引发了干扰。从数字实验中,我们强调了三个经验结果:(一) 算法成功地加快了学生学习,(二) 学生的隐私感知觉的深度强化学习算法,因为我们根据教师的隐私机制调整了它的目标。 (三) 在没有提高学生的演示后,学生的成绩限制下,学生的保密程度是无法保持。