Detecting out-of-distribution (OOD) and adversarial samples is essential when deploying classification models in real-world applications. We introduce Channel Mean Discrepancy (CMD), a model-agnostic distance metric for evaluating the statistics of features extracted by classification models, inspired by integral probability metrics. CMD compares the feature statistics of incoming samples against feature statistics estimated from previously seen training samples with minimal overhead. We experimentally demonstrate that CMD magnitude is significantly smaller for legitimate samples than for OOD and adversarial samples. We propose a simple method to reliably differentiate between legitimate samples from OOD and adversarial samples using CMD, requiring only a single forward pass on a pre-trained classification model per sample. We further demonstrate how to achieve single image detection by using a lightweight model for channel sensitivity tuning, an improvement on other statistical detection methods. Preliminary results show that our simple yet effective method outperforms several state-of-the-art approaches to detecting OOD and adversarial samples across various datasets and attack methods with high efficiency and generalizability.
翻译:在实际应用中,在应用分类模型时,检测分布区外样本和对抗性样本至关重要。我们引入了频道平均差异模型(CMD),这是根据整体概率指标评估分类模型所提取特征的统计的模型-不可知距离测量标准。CMD比较了从以前看到的培训样本中采集的样本的特征统计数据,而以前曾看到的培训样本中估计的特征统计数据的间接费用。我们实验性地证明,合法样本的CMD数量大大小于OOD和对抗性样本的浓度。我们提出了一个可靠地区分合法样本与使用CMD的 OOD和对抗性样本的样本的简单方法,只需要在经过预先训练的分类模型中有一个前方通行证。我们进一步展示了如何通过使用轻量的频道敏感度调节模型实现单一图像检测的方法,这是对其他统计检测方法的改进。初步结果表明,我们简单而有效的方法超越了在各种数据集和袭击方法中探测OD和对抗性样本的几种最先进的方法。我们提出了一个简单的方法。我们提出了一种可靠地区分合法样本和对抗性样本的简单方法。我们提出了一种方法。我们进一步展示了如何通过高效和普遍使用一种方法。