We expose the danger of reward poisoning in offline multi-agent reinforcement learning (MARL), whereby an attacker can modify the reward vectors to different learners in an offline data set while incurring a poisoning cost. Based on the poisoned data set, all rational learners using some confidence-bound-based MARL algorithm will infer that a target policy - chosen by the attacker and not necessarily a solution concept originally - is the Markov perfect dominant strategy equilibrium for the underlying Markov Game, hence they will adopt this potentially damaging target policy in the future. We characterize the exact conditions under which the attacker can install a target policy. We further show how the attacker can formulate a linear program to minimize its poisoning cost. Our work shows the need for robust MARL against adversarial attacks.
翻译:我们暴露了在离线多试剂强化学习中奖励中毒的危险,攻击者可以在离线数据集中向不同学习者修改奖励矢量,同时产生中毒费用。根据有毒数据集,所有理性学习者使用某种基于信任的MARL算法,可以推断出,攻击者选择的目标政策,而不是最初的解决方案概念,是马可夫对马可夫游戏的完美主导战略平衡,因此他们将来将采用这种可能具有破坏性的目标政策。我们确定攻击者可以安装目标政策的确切条件。我们进一步展示攻击者如何制定线性方案,以尽量减少其中毒费用。我们的工作表明,需要强大的MARL来对抗敌对性攻击。