Deep neural networks are known to have security issues. One particular threat is the Trojan attack. It occurs when the attackers stealthily manipulate the model's behavior through Trojaned training samples, which can later be exploited. Guided by basic neuroscientific principles we discover subtle -- yet critical -- structural deviation characterizing Trojaned models. In our analysis we use topological tools. They allow us to model high-order dependencies in the networks, robustly compare different networks, and localize structural abnormalities. One interesting observation is that Trojaned models develop short-cuts from input to output layers. Inspired by these observations, we devise a strategy for robust detection of Trojaned models. Compared to standard baselines it displays better performance on multiple benchmarks.
翻译:已知深神经网络有安全问题。 一种特殊的威胁是Trojan攻击。 当攻击者通过Trojan训练样本悄悄地操纵模型的行为时, 它就会发生, 这些样本后来可以被利用。 遵循基本的神经科学原则, 我们发现了微妙的, 但却是关键的结构偏差, 以Trojaned模型为特征。 在我们的分析中, 我们使用地形学工具。 它们允许我们模拟网络的高阶依赖性, 强力比较不同的网络, 并且将结构异常化地方化。 一个有趣的观察是, Trojaned 模型从输入到输出层之间发展了短径。 在这些观察的启发下, 我们设计了一种战略, 以强力探测Trojaned模型。 与它显示在多个基准上更好表现的标准基线相比。