Deep neural networks (DNNs) are proved to be vulnerable against backdoor attacks. A backdoor is often embedded in the target DNNs through injecting a backdoor trigger into training examples, which can cause the target DNNs misclassify an input attached with the backdoor trigger. Existing backdoor detection methods often require the access to the original poisoned training data, the parameters of the target DNNs, or the predictive confidence for each given input, which are impractical in many real-world applications, e.g., on-device deployed DNNs. We address the black-box hard-label backdoor detection problem where the DNN is fully black-box and only its final output label is accessible. We approach this problem from the optimization perspective and show that the objective of backdoor detection is bounded by an adversarial objective. Further theoretical and empirical studies reveal that this adversarial objective leads to a solution with highly skewed distribution; a singularity is often observed in the adversarial map of a backdoor-infected example, which we call the adversarial singularity phenomenon. Based on this observation, we propose the adversarial extreme value analysis(AEVA) to detect backdoors in black-box neural networks. AEVA is based on an extreme value analysis of the adversarial map, computed from the monte-carlo gradient estimation. Evidenced by extensive experiments across multiple popular tasks and backdoor attacks, our approach is shown effective in detecting backdoor attacks under the black-box hard-label scenarios.
翻译:深心神经网络(DNN)被证明很容易受到幕后攻击。一个后门往往通过将后门触发器插入培训实例,嵌入目标DNN内,这可能导致目标DNN错误地分类与后门触发器有关的输入。现有的后门探测方法往往需要访问原始有毒培训数据、目标DNN的参数或每个输入的预测性信心,而这些在很多现实世界应用中是不切实际的,例如,在部署的构件上,DNN。我们解决了黑箱硬标签后门探测问题,DNN是完全的黑箱,只有最后输出标签是无障碍的。我们从优化角度对待这一问题,并表明后门探测目标受对抗性目标的约束。进一步的理论和经验研究显示,这种对抗性目标导致一种使用高度扭曲分布的解决方案;在后门感染方法的对立面图中,我们称之为对抗性独一格。基于这一观察的黑箱攻击,我们提议在激烈的对面的极端价值网络中进行一个反向后端分析。在模拟的图像中,一个基于黑面的对面的对面的对面的对面的对面分析,对面的对面的对面的对面的对面的对面分析,对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面分析是的对面的对面的对面的对面的对面分析是的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面