Transfer learning is prevalent as a technique to efficiently generate new models (Student models) based on the knowledge transferred from a pre-trained model (Teacher model). However, Teacher models are often publicly available for sharing and reuse, which inevitably introduces vulnerability to trigger severe attacks against transfer learning systems. In this paper, we take a first step towards mitigating one of the most advanced misclassification attacks in transfer learning. We design a distilled differentiator via activation-based network pruning to enervate the attack transferability while retaining accuracy. We adopt an ensemble structure from variant differentiators to improve the defence robustness. To avoid the bloated ensemble size during inference, we propose a two-phase defence, in which inference from the Student model is firstly performed to narrow down the candidate differentiators to be assembled, and later only a small, fixed number of them can be chosen to validate clean or reject adversarial inputs effectively. Our comprehensive evaluations on both large and small image recognition tasks confirm that the Student models with our defence of only 5 differentiators are immune to over 90% of the adversarial inputs with an accuracy loss of less than 10%. Our comparison also demonstrates that our design outperforms prior problematic defences.
翻译:传授学习是一种根据预先培训的模型(教师模型)所传授的知识有效生成新模型(学生模型)的技术,因此,传授学习是一种普遍的做法;然而,教师模型往往公开供分享和再利用,这不可避免地造成脆弱性,对转移学习系统造成严重攻击;在本文中,我们迈出第一步,以缓解转移学习中最先进的错误分类攻击之一;我们设计了一个蒸馏式差异器,通过基于激活的网络运行,对攻击可转移性进行抑制,同时保留准确性;我们采用了来自各种差异的混合结构,以提高防御的稳健性;为避免在推断期间混合的混合体大小,我们建议分两阶段进行辩护,首先从学生模型中作出推断,缩小拟组装的候选差异,后来只有少量固定数目,以有效验证清洁或拒绝对抗性投入;我们对大小图像识别任务的全面评价证实,只有5个差异学生模型与我们防御的5个差异模型都可避免超过90%的防御能力;我们之前的对抗性投入比我们的防御能力低。