There has been emerging interest to use transductive learning for adversarial robustness (Goldwasser et al., NeurIPS 2020; Wu et al., ICML 2020). Compared to traditional "test-time" defenses, these defense mechanisms "dynamically retrain" the model based on test time input via transductive learning; and theoretically, attacking these defenses boils down to bilevel optimization, which seems to raise the difficulty for adaptive attacks. In this paper, we first formalize and analyze modeling aspects of transductive robustness. Then, we propose the principle of attacking model space for solving bilevel attack objectives, and present an instantiation of the principle which breaks previous transductive defenses. These attacks thus point to significant difficulties in the use of transductive learning to improve adversarial robustness. To this end, we present new theoretical and empirical evidence in support of the utility of transductive learning.
翻译:与传统的“试验时间”防御相比,这些防御机制“动力再培训”基于通过转化学习测试时间输入模型的模型;理论上,攻击这些防御机制会归结为双级优化,这似乎增加了适应性攻击的难度。在本文中,我们首先正式确定并分析转化性强力的模型方面。然后,我们提出了攻击空间以达到双级攻击目标的模型原则,并提出了打破先前转导防御的原则的即时解释。因此,这些攻击表明在利用转导学习提高对抗性强力方面存在着重大困难。为此,我们提出了新的理论和经验证据,以支持转导性学习的效用。