MISA: 使用误判手段在线保护仪表模型 (MISA: Online Defense of Trojaned Models using Misattributions)

Recent studies have shown that neural networks are vulnerable to Trojan attacks, where a network is trained to respond to specially crafted trigger patterns in the inputs in specific and potentially malicious ways. This paper proposes MISA, a new online approach to detect Trojan triggers for neural networks at inference time. Our approach is based on a novel notion called misattributions, which captures the anomalous manifestation of a Trojan activation in the feature space. Given an input image and the corresponding output prediction, our algorithm first computes the model's attribution on different features. It then statistically analyzes these attributions to ascertain the presence of a Trojan trigger. Across a set of benchmarks, we show that our method can effectively detect Trojan triggers for a wide variety of trigger patterns, including several recent ones for which there are no known defenses. Our method achieves 96% AUC for detecting images that include a Trojan trigger without any assumptions on the trigger pattern.

翻译：最近的研究显示,神经网络容易受到Trojan攻击,因为一个网络受过训练,能够以特定和潜在的恶意方式对输入中特别设计的触发模式作出反应。本文提出MISA,这是在推断时检测神经网络中Trojan触发器的一个新的在线方法。我们的方法基于一个新颖的概念,即误差,它捕捉了特洛伊在地物空间中激活的异常表现。根据一个输入图像和相应的输出预测,我们的算法首先计算出模型在不同特性上的属性。然后用统计分析这些属性,以确定特洛伊触发器的存在。在一系列基准中,我们表明我们的方法能够有效地检测特洛伊触发器的多种模式,包括最近一些没有已知防御装置的触发模式。我们的方法达到96%的AUC,用于探测图像,其中包括没有设定触发模式的特洛伊触发器。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【MIT】自监督几何感知，22页ppt，Self-supervised Geometric Perception

专知会员服务

23+阅读 · 2021年6月3日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

【O'Reilly AI Conference 2019】大规模构建和部署AI应用程序和系统（Building and deploying AI applications and systems at scale），O'Reilly的首席数据科学家Ben Lorica、Computable 联合创始人兼首席执行官Roger Chen

专知会员服务

26+阅读 · 2019年11月5日