Model adaptation aims at solving the domain transfer problem under the constraint of only accessing the pretrained source models. With the increasing considerations of data privacy and transmission efficiency, this paradigm has been gaining recent popularity. This paper studies the vulnerability to universal attacks transferred from the source domain during model adaptation algorithms due to the existence of the malicious providers. We explore both universal adversarial perturbations and backdoor attacks as loopholes on the source side and discover that they still survive in the target models after adaptation. To address this issue, we propose a model preprocessing framework, named AdaptGuard, to improve the security of model adaptation algorithms. AdaptGuard avoids direct use of the risky source parameters through knowledge distillation and utilizes the pseudo adversarial samples under adjusted radius to enhance the robustness. AdaptGuard is a plug-and-play module that requires neither robust pretrained models nor any changes for the following model adaptation algorithms. Extensive results on three commonly used datasets and two popular adaptation methods validate that AdaptGuard can effectively defend against universal attacks and maintain clean accuracy in the target domain simultaneously. We hope this research will shed light on the safety and robustness of transfer learning.
翻译:模型适应旨在解决领域转移问题,条件是只能访问预训练的源模型。随着对数据隐私和传输效率的越来越多考虑,这种范式最近获得了流行。本文研究在模型适应算法中由于存在恶意提供者而对源域传输的通用攻击易受攻击性。我们探索通用对抗扰动和后门攻击作为源侧漏洞,并发现它们仍然在适应后的目标模型中生存。为解决这个问题,我们提出了一种模型预处理框架,名为AdaptGuard,以改善模型适应算法的安全性。AdaptGuard通过知识蒸馏避免直接使用风险源参数,并利用在调整半径下的伪对抗样本增强了鲁棒性。AdaptGuard是一个即插即用的模块,不需要强韧的预训练模型或任何对接下来的模型适应算法的更改。对三个常用数据集和两种流行的适应方法的广泛结果验证AdaptGuard可以有效地防御通用攻击并同时保持干净的准确性。我们希望这项研究将为转移学习的安全性和鲁棒性带来启示。