One-class novelty detectors are trained with examples of a particular class and are tasked with identifying whether a query example belongs to the same known class. Most recent advances adopt a deep auto-encoder style architecture to compute novelty scores for detecting novel class data. Deep networks have shown to be vulnerable to adversarial attacks, yet little focus is devoted to studying the adversarial robustness of deep novelty detectors. In this paper, we first show that existing novelty detectors are susceptible to adversarial examples. We further demonstrate that commonly-used defense approaches for classification tasks have limited effectiveness in one-class novelty detection. Hence, we need a defense specifically designed for novelty detection. To this end, we propose a defense strategy that manipulates the latent space of novelty detectors to improve the robustness against adversarial examples. The proposed method, referred to as Principal Latent Space (PLS), learns the incrementally-trained cascade principal components in the latent space to robustify novelty detectors. PLS can purify latent space against adversarial examples and constrain latent space to exclusively model the known class distribution. We conduct extensive experiments on multiple attacks, datasets and novelty detectors, showing that PLS consistently enhances the adversarial robustness of novelty detection models.
翻译:单级新发现探测器是用特定类别的例子来训练的, 并负责确定某一类的查询示例是否属于同一类的同一类。 多数最近的进展都采用深自动编码样式结构来计算新类数据的新分数。 深网络显示很容易受到对抗性攻击, 但很少关注研究深层新发现探测器的对抗性坚固性。 在本文中, 我们首先显示现有的新发现探测器容易采用对抗性实例。 我们进一步表明, 用于分类任务的常用防御方法在单级新发现中的效力有限。 因此, 我们需要专门设计一种用于新颖检测的防御设备。 为此, 我们提议了一项防御战略, 操纵新发现探测器的潜在空间, 以提高对抗性对抗性实例的稳健性。 拟议的方法, 被称为“ 冷空域首席”, 学习潜藏空间中经过渐进式训练的连锁主要组件, 以强化新发现性探测器。 PLS 可以净化隐性空间, 并且将潜伏空间限制为已知的类分配模式。 我们针对多起攻击、 数据设置和新颖对抗性探测器进行广泛的实验。