We investigate how effective an attacker can be when it only learns from its victim's actions, without access to the victim's reward. In this work, we are motivated by the scenario where the attacker wants to behave strategically when the victim's motivations are unknown. We argue that one heuristic approach an attacker can use is to maximize the entropy of the victim's policy. The policy is generally not obfuscated, which implies it may be extracted simply by passively observing the victim. We provide such a strategy in the form of a reward-free exploration algorithm that maximizes the attacker's entropy during the exploration phase, and then maximizes the victim's empirical entropy during the planning phase. In our experiments, the victim agents are subverted through policy entropy maximization, implying an attacker might not need access to the victim's reward to succeed. Hence, reward-free attacks, which are based only on observing behavior, show the feasibility of an attacker to act strategically without knowledge of the victim's motives even if the victim's reward information is protected.
翻译:我们调查攻击者只有从受害者的行为中学习,而没有获得受害者奖赏,攻击者才会有多么有效。在这项工作中,我们受到攻击者想在受害者动机不明的情况下采取战略行动的设想的激励。我们争辩说,攻击者可以使用一种超自然的方法来最大限度地扩大受害者政策的诱惑力。一般而言,该政策并不模糊,这意味着它可能只是通过被动观察受害者而获得。我们以无报酬的探索算法的形式提供这样的战略,在探索阶段使攻击者的昆虫最大化,然后在规划阶段使受害者经验性昆虫最大化。在我们的实验中,受害者代理人通过政策催眠最大化被颠覆,意味着攻击者可能不需要获得受害者奖赏来取得成功。因此,仅仅以观察行为为基础的无报酬攻击者在不了解受害者动机的情况下采取行动战略的可行性,即使受害者的报酬信息受到保护。