The problem of adversarial defenses for image classification, where the goal is to robustify a classifier against adversarial examples, is considered. Inspired by the hypothesis that these examples lie beyond the natural image manifold, a novel aDversarIal defenSe with local impliCit functiOns (DISCO) is proposed to remove adversarial perturbations by localized manifold projections. DISCO consumes an adversarial image and a query pixel location and outputs a clean RGB value at the location. It is implemented with an encoder and a local implicit module, where the former produces per-pixel deep features and the latter uses the features in the neighborhood of query pixel for predicting the clean RGB value. Extensive experiments demonstrate that both DISCO and its cascade version outperform prior defenses, regardless of whether the defense is known to the attacker. DISCO is also shown to be data and parameter efficient and to mount defenses that transfers across datasets, classifiers and attacks.
翻译:对图像分类的对抗性防御问题得到了考虑,目的是用对抗性防御来强化一个分类器来对抗对抗性实例。这些例子超越了自然图像元数的假设,由此引发了这些例子的假设:用本地的 implliCit functiOns (DISCO) 来消除对抗性干扰。 DISCO 在当地使用对抗性图像和查询像素位置,并输出一个干净的 RGB 值。它用一个编码器和一个本地的隐含模块来实施,前者产生每像素深层特征,后者在查询像素周围使用特性来预测清洁 RGB 值。广泛的实验表明,DICO 及其级联号版本都比先前防御系统更完美,而不管攻击者是否知道防御系统。 DISCO 也显示数据和参数的效率,并架起跨越数据集、分类和攻击系统的防御系统。