Reinforcement learning (RL) presents numerous benefits compared to rule-based approaches in various applications. Privacy concerns have grown with the widespread use of RL trained with privacy-sensitive data in IoT devices, especially for human-in-the-loop systems. On the one hand, RL methods enhance the user experience by trying to adapt to the highly dynamic nature of humans. On the other hand, trained policies can leak the user's private information. Recent attention has been drawn to designing privacy-aware RL algorithms while maintaining an acceptable system utility. A central challenge in designing privacy-aware RL, especially for human-in-the-loop systems, is that humans have intrinsic variability and their preferences and behavior evolve. The effect of one privacy leak mitigation can be different for the same human or across different humans over time. Hence, we can not design one fixed model for privacy-aware RL that fits all. To that end, we propose adaPARL, an adaptive approach for privacy-aware RL, especially for human-in-the-loop IoT systems. adaPARL provides a personalized privacy-utility trade-off depending on human behavior and preference. We validate the proposed adaPARL on two IoT applications, namely (i) Human-in-the-Loop Smart Home and (ii) Human-in-the-Loop Virtual Reality (VR) Smart Classroom. Results obtained on these two applications validate the generality of adaPARL and its ability to provide a personalized privacy-utility trade-off. On average, for the first application, adaPARL improves the utility by $57\%$ over the baseline and by $43\%$ over randomization. adaPARL also reduces the privacy leak by $23\%$ on average. For the second application, adaPARL decreases the privacy leak to $44\%$ before the utility drops by $15\%$.
翻译:强化学习(RL) 与各种应用中基于规则的方法相比有许多好处。 隐私问题随着在 IoT 设备中广泛使用以对隐私敏感数据培训的 RL, 特别是对于在环形系统中的人类而言。 一方面, RL 方法通过试图适应人类高度动态性来增强用户的经验。 另一方面, 受过培训的政策可以泄露用户的隐私信息。 最近人们注意到了设计隐私识别RL 算法,同时保持一个可接受的系统实用性。 在设计隐私识别RL, 特别是对于在互联网中的人类系统来说,一个中心难题是人类有内在的变异性及其偏好和行为演变。 减少一个隐私泄露的方法对同一人或不同的人来说可能不同。 因此,我们不能设计一个适合用户隐私识别RLL的固定模型。 为此,我们提议在Smarioality RL之前, 一种对隐私认知的适应性方法, 特别是对于在互联网中的人的系统来说, 一个核心应用, 提供了一种虚拟的自我智能应用。 adaLPAR- prealalalalal-al a real real realalal laviewal laview aviewal lax the the the the the the liflialalalal laut the lifal liversal laview lax the the laut the laut the laut the lifal liflifal laview latical latical develtial- latial latial latial latical- latitual laut.</s>