We propose to utilize gradients for detecting adversarial and out-of-distribution samples. We introduce confounding labels -- labels that differ from normal labels seen during training -- in gradient generation to probe the effective expressivity of neural networks. Gradients depict the amount of change required for a model to properly represent given inputs, providing insight into the representational power of the model established by network architectural properties as well as training data. By introducing a label of different design, we remove the dependency on ground truth labels for gradient generation during inference. We show that our gradient-based approach allows for capturing the anomaly in inputs based on the effective expressivity of the models with no hyperparameter tuning or additional processing, and outperforms state-of-the-art methods for adversarial and out-of-distribution detection.
翻译:我们建议使用梯度来检测对抗性和分配外的样本。我们在梯度生成中引入混为一谈的标签 -- -- 不同于培训期间看到的正常标签 -- -- 以探究神经网络的有效表达性。梯度描述模型适当代表给定投入所需的变化量,深入了解网络建筑特性所建立模型的代表性力量以及培训数据。通过引入不同的设计标签,我们消除了在推断过程中对梯度生成的地面真实性标签的依赖性。我们显示,基于梯度的方法允许根据没有超分计调或额外处理的模型的有效表达性来捕捉投入中的异常现象,并改进了对抗性和分配外检测的最先进的方法。