In targeted poisoning attacks, an attacker manipulates an agent-environment interaction to force the agent into adopting a policy of interest, called target policy. Prior work has primarily focused on attacks that modify standard MDP primitives, such as rewards or transitions. In this paper, we study targeted poisoning attacks in a two-agent setting where an attacker implicitly poisons the effective environment of one of the agents by modifying the policy of its peer. We develop an optimization framework for designing optimal attacks, where the cost of the attack measures how much the solution deviates from the assumed default policy of the peer agent. We further study the computational properties of this optimization framework. Focusing on a tabular setting, we show that in contrast to poisoning attacks based on MDP primitives (transitions and (unbounded) rewards), which are always feasible, it is NP-hard to determine the feasibility of implicit poisoning attacks. We provide characterization results that establish sufficient conditions for the feasibility of the attack problem, as well as an upper and a lower bound on the optimal cost of the attack. We propose two algorithmic approaches for finding an optimal adversarial policy: a model-based approach with tabular policies and a model-free approach with parametric/neural policies. We showcase the efficacy of the proposed algorithms through experiments.
翻译:在定点中毒袭击中,攻击者操纵一种代理-环境互动,迫使代理人采取一项利益政策,称为目标政策。先前的工作主要侧重于改变标准 MDP原始物的攻击,如奖赏或过渡。在本文中,我们研究在两个试剂环境下的定点中毒袭击,攻击者通过修改其同行的政策间接毒害其中一名代理人的有效环境。我们为设计最佳攻击制定了一个优化框架,攻击措施的成本大大偏离了同侪代理人的假定默认政策。我们进一步研究了这一优化框架的计算特性。我们侧重于一个表格设置,显示与基于MDP原始物(过渡和(无限制)奖赏)的中毒袭击相比,我们总是可行的,很难确定隐性中毒袭击的可行性。我们提供特征分析结果,为攻击问题的可行性创造充分条件,以及攻击的最佳成本上限和下限。我们提出了两种算法方法,以找到一种最佳的对抗政策:一种基于模型的方法,以表层/模型的试验方法,通过模型的测试政策,采用模型式的检验方法。我们提出了一种不设底的实验法。</s>