Self-play reinforcement learning has achieved state-of-the-art, and often superhuman, performance in a variety of zero-sum games. Yet prior work has found that policies that are highly capable against regular opponents can fail catastrophically against adversarial policies: an opponent trained explicitly against the victim. Prior defenses using adversarial training were able to make the victim robust to a specific adversary, but the victim remained vulnerable to new ones. We conjecture this limitation was due to insufficient diversity of adversaries seen during training. We propose a defense using population based training to pit the victim against a diverse set of opponents. We evaluate this defense's robustness against new adversaries in two low-dimensional environments. Our defense increases robustness against adversaries, as measured by number of attacker training timesteps to exploit the victim. Furthermore, we show that robustness is correlated with the size of the opponent population.
翻译:自我游戏强化学习在各种零和游戏中取得了最先进的,而且往往是超人的表现。然而,先前的工作发现,对普通对手非常有能力的政策在对抗对抗敌对政策时,可能会以灾难性的方式失败:对手被明确训练对付受害者。使用对抗性训练的先前辩护使受害者能够对特定的对手强大,但受害者仍然易受新的对手的伤害。我们推测,这一限制是由于在训练期间看到对手的多样性不够。我们提议使用基于人口的训练来使受害者与不同的对手对立。我们评估这种防御在两个低维环境中对新对手的强健性。我们的防御增加了对对手的强健性,用攻击者训练时间的多少来衡量对受害者的剥削。此外,我们表明,强健性与对手人口的规模相关。