Meta reinforcement learning (meta RL), as a combination of meta-learning ideas and reinforcement learning (RL), enables the agent to adapt to different tasks using a few samples. However, this sampling-based adaptation also makes meta RL vulnerable to adversarial attacks. By manipulating the reward feedback from sampling processes in meta RL, an attacker can mislead the agent into building wrong knowledge from training experience, which deteriorates the agent's performance when dealing with different tasks after adaptation. This paper provides a game-theoretical underpinning for understanding this type of security risk. In particular, we formally define the sampling attack model as a Stackelberg game between the attacker and the agent, which yields a minimax formulation. It leads to two online attack schemes: Intermittent Attack and Persistent Attack, which enable the attacker to learn an optimal sampling attack, defined by an $\epsilon$-first-order stationary point, within $\mathcal{O}(\epsilon^{-2})$ iterations. These attack schemes freeride the learning progress concurrently without extra interactions with the environment. By corroborating the convergence results with numerical experiments, we observe that a minor effort of the attacker can significantly deteriorate the learning performance, and the minimax approach can also help robustify the meta RL algorithms.
翻译:元强化学习(meta RL)是元学习理念和强化学习(RL)的结合,它使该代理商能够适应使用少数样本的不同任务。然而,这种基于抽样的适应性调整也使得元RL易受对抗性攻击。通过操纵元RL中取样过程的奖励反馈,攻击者可以误导该代理商从培训经验中积累错误的知识,这在适应后处理不同任务时使该代理商的性能恶化。本文件为了解这种安全风险提供了一种游戏理论基础。特别是,我们正式将抽样攻击模式定义为攻击者与代理商之间的斯塔克尔贝格游戏,产生迷你式的配方。它导致两个在线攻击计划:Interpitt攻击和持久性攻击,使袭击者能够从培训经验中学习最佳的抽样攻击,而培训者在处理不同任务时,在$mathcal{O} (\epsilon ⁇ -2} (\\\\\\\\ 2} ) 里拉特。这些攻击计划可以将学习的进展与环境同时自由,而没有额外的相互作用。它产生微量的配方配方配方的配方的组合。它能能够使微的实验,我们观察到微的实验,通过大大的实验,可以观测微级的累合性的努力。