Deep neural networks (DNNs) are vulnerable to backdoor attacks, where adversaries embed a hidden backdoor trigger during the training process for malicious prediction manipulation. These attacks pose great threats to the applications of DNNs under the real-world machine learning as a service (MLaaS) setting, where the deployed model is fully black-box while the users can only query and obtain its predictions. Currently, there are many existing defenses to reduce backdoor threats. However, almost all of them cannot be adopted in MLaaS scenarios since they require getting access to or even modifying the suspicious models. In this paper, we propose a simple yet effective black-box input-level backdoor detection, called SCALE-UP, which requires only the predicted labels to alleviate this problem. Specifically, we identify and filter malicious testing samples by analyzing their prediction consistency during the pixel-wise amplification process. Our defense is motivated by an intriguing observation (dubbed scaled prediction consistency) that the predictions of poisoned samples are significantly more consistent compared to those of benign ones when amplifying all pixel values. Besides, we also provide theoretical foundations to explain this phenomenon. Extensive experiments are conducted on benchmark datasets, verifying the effectiveness and efficiency of our defense and its resistance to potential adaptive attacks. Our codes are available at https://github.com/JunfengGo/SCALE-UP.
翻译:深心神经网络(DNNS)很容易受到幕后攻击,因为敌人在恶意预测操纵培训过程中隐藏了一个隐藏的幕后触发器。这些攻击对在现实世界机器学习(MLAAS)设置下应用DNS构成重大威胁,因为在服务(MLAAS)设置下,部署的模型是完全黑箱,而用户只能查询和获得预测。目前,有许多现有的防御手段可以减少幕后威胁。然而,几乎所有这些防御手段都无法在MLaaAS情景中采用,因为它们需要访问甚至修改可疑模型。在本文中,我们提议一个简单而有效的黑箱输入级后门探测,称为SCALE-UP,这只需要预测的标签来缓解这一问题。具体地说,我们通过分析其预测一致性,来识别和过滤恶意测试样本。我们的防御动力是令人着迷惑的观察(模糊的预测一致性),即毒性样本的预测与良性样本的预测在放大所有像/J值时明显一致。此外,我们还提供理论基础来解释我们的安全标准。