Deep neural networks (DNNs), while accurate, are expensive to train. Many practitioners, therefore, outsource the training process to third parties or use pre-trained DNNs. This practice makes DNNs vulnerable to $backdoor$ $attacks$: the third party who trains the model may act maliciously to inject hidden behaviors into the otherwise accurate model. Until now, the mechanism to inject backdoors has been limited to $poisoning$. We argue that such a supply-chain attacker has more attack techniques available. To study this hypothesis, we introduce a handcrafted attack that directly manipulates the parameters of a pre-trained model to inject backdoors. Our handcrafted attacker has more degrees of freedom in manipulating model parameters than poisoning. This makes it difficult for a defender to identify or remove the manipulations with straightforward methods, such as statistical analysis, adding random noises to model parameters, or clipping their values within a certain range. Further, our attacker can combine the handcrafting process with additional techniques, $e.g.$, jointly optimizing a trigger pattern, to inject backdoors into complex networks effectively$-$the meet-in-the-middle attack. In evaluations, our handcrafted backdoors remain effective across four datasets and four network architectures with a success rate above 96%. Our backdoored models are resilient to both parameter-level backdoor removal techniques and can evade existing defenses by slightly changing the backdoor attack configurations. Moreover, we demonstrate the feasibility of suppressing unwanted behaviors otherwise caused by poisoning. Our results suggest that further research is needed for understanding the complete space of supply-chain backdoor attacks.
翻译:深心神经网络(DNNs)虽然准确,但培训费用很高。许多实践者因此将培训过程外包给第三方,或者使用经过训练的DNS。这样的做法使得DNS容易受到美元后门攻击:培训模型的第三方可能恶意地将隐藏行为注入本来准确的模型中。到目前为止,输入后门的机制一直限于美元。我们争辩说,这样的供应链攻击者拥有更多的攻击技术。因此,为了研究这一假设,我们引入了直接操纵预先训练的模型参数以输入后门的手动攻击。我们的手动攻击者在操纵模型参数时拥有的自由度比中毒要高:训练模型的第三方可能恶意地将隐藏的行为注入原本准确的模型中。直到现在,输入后门的机器机制一直局限于美元。此外,我们的攻击者可以把手动的后门操作程序与更多的技术结合起来,例如: 美元,共同优化前训练后门的模型参数攻击的参数的参数, 也意味着在复杂的网络中,需要4美元 。这让一个捍卫者识别或清除后门后门的操作。