AEVA: 利用反向极端价值分析进行黑箱后门探测 (AEVA: Black-box Backdoor Detection Using Adversarial Extreme Value Analysis)

Deep neural networks (DNNs) are proved to be vulnerable against backdoor attacks. A backdoor is often embedded in the target DNNs through injecting a backdoor trigger into training examples, which can cause the target DNNs misclassify an input attached with the backdoor trigger. Existing backdoor detection methods often require the access to the original poisoned training data, the parameters of the target DNNs, or the predictive confidence for each given input, which are impractical in many real-world applications, e.g., on-device deployed DNNs. We address the black-box hard-label backdoor detection problem where the DNN is fully black-box and only its final output label is accessible. We approach this problem from the optimization perspective and show that the objective of backdoor detection is bounded by an adversarial objective. Further theoretical and empirical studies reveal that this adversarial objective leads to a solution with highly skewed distribution; a singularity is often observed in the adversarial map of a backdoor-infected example, which we call the adversarial singularity phenomenon. Based on this observation, we propose the adversarial extreme value analysis(AEVA) to detect backdoors in black-box neural networks. AEVA is based on an extreme value analysis of the adversarial map, computed from the monte-carlo gradient estimation. Evidenced by extensive experiments across multiple popular tasks and backdoor attacks, our approach is shown effective in detecting backdoor attacks under the black-box hard-label scenarios.

翻译：深心神经网络(DNN)被证明很容易受到幕后攻击。一个后门往往通过将后门触发器插入培训实例,嵌入目标DNN内,这可能导致目标DNN错误地分类与后门触发器有关的输入。现有的后门探测方法往往需要访问原始有毒培训数据、目标DNN的参数或每个输入的预测性信心,而这些在很多现实世界应用中是不切实际的,例如,在部署的构件上,DNN。我们解决了黑箱硬标签后门探测问题,DNN是完全的黑箱,只有最后输出标签是无障碍的。我们从优化角度对待这一问题,并表明后门探测目标受对抗性目标的约束。进一步的理论和经验研究显示,这种对抗性目标导致一种使用高度扭曲分布的解决方案;在后门感染方法的对立面图中,我们称之为对抗性独一格。基于这一观察的黑箱攻击,我们提议在激烈的对面的极端价值网络中进行一个反向后端分析。在模拟的图像中,一个基于黑面的对面的对面的对面的对面的对面分析,对面的对面的对面的对面的对面的对面分析,对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面分析是的对面的对面的对面的对面的对面分析是的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面的对面

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

【ACL2020-CMU】预训练模型权重攻击，Weight Poisoning Attacks on PTM

专知会员服务

12+阅读 · 2020年4月16日