Deep neural networks are vulnerable to adversarial attacks. In this paper, we take the role of investigators who want to trace the attack and identify the source, that is, the particular model which the adversarial examples are generated from. Techniques derived would aid forensic investigation of attack incidents and serve as deterrence to potential attacks. We consider the buyers-seller setting where a machine learning model is to be distributed to various buyers and each buyer receives a slightly different copy with same functionality. A malicious buyer generates adversarial examples from a particular copy $\mathcal{M}_i$ and uses them to attack other copies. From these adversarial examples, the investigator wants to identify the source $\mathcal{M}_i$. To address this problem, we propose a two-stage separate-and-trace framework. The model separation stage generates multiple copies of a model for a same classification task. This process injects unique characteristics into each copy so that adversarial examples generated have distinct and traceable features. We give a parallel structure which embeds a ``tracer'' in each copy, and a noise-sensitive training loss to achieve this goal. The tracing stage takes in adversarial examples and a few candidate models, and identifies the likely source. Based on the unique features induced by the noise-sensitive loss function, we could effectively trace the potential adversarial copy by considering the output logits from each tracer. Empirical results show that it is possible to trace the origin of the adversarial example and the mechanism can be applied to a wide range of architectures and datasets.
翻译:深心神经网络很容易受到对抗性攻击。 在本文中, 我们扮演调查员的角色, 他们想要追踪攻击, 并找出来源, 即从对抗性例子中产生的特定模型。 技术衍生出有助于对攻击事件进行法证调查, 并起到威慑潜在攻击的作用。 我们考虑买方- 卖方设置, 将机器学习模型分发给不同的买主, 每位买主都会收到一个略有不同的副本, 功能相同。 恶意买家从一个特定的副本 $\mathcal{ m ⁇ i$ 中生成对抗性例子, 并用它们来攻击其他副本。 从这些对抗性例子中, 调查员希望确定来源 $\ mathcal{M ⁇ $ 。 为了解决这个问题, 我们提议一个两阶段的单独和追踪框架。 模型为同一分类任务制作了多个模型。 这一过程将独特的特性注入每份副本, 这样产生的对抗性范例具有不同的和可追踪性特征。 我们给每份副本中包含一个“ tracer” 的平行结构, 以及一个对噪音敏感的训练损失来达到这个目标。 我们通过一个独特的辩论性模型, 将显示一个单一的追踪模型 。 通过一个独特的模型来显示一个独特的模型, 。