Visual anomaly detection, an important problem in computer vision, is usually formulated as a one-class classification and segmentation task. The student-teacher (S-T) framework has proved to be effective in solving this challenge. However, previous works based on S-T only empirically applied constraints on normal data and fused multi-level information. In this study, we propose an improved model called DeSTSeg, which integrates a pre-trained teacher network, a denoising student encoder-decoder, and a segmentation network into one framework. First, to strengthen the constraints on anomalous data, we introduce a denoising procedure that allows the student network to learn more robust representations. From synthetically corrupted normal images, we train the student network to match the teacher network feature of the same images without corruption. Second, to fuse the multi-level S-T features adaptively, we train a segmentation network with rich supervision from synthetic anomaly masks, achieving a substantial performance improvement. Experiments on the industrial inspection benchmark dataset demonstrate that our method achieves state-of-the-art performance, 98.6% on image-level ROC, 75.8% on pixel-level average precision, and 76.4% on instance-level average precision.
翻译:视觉异常检测是计算机视觉中的一个重要问题,通常被设计成单级分类和分割任务。 学生- 教师( S- T) 框架已证明有效解决了这一挑战。 然而, 以往基于 S- T 的工程仅以经验方式对正常数据和混合多层次信息应用了常规数据的限制。 在此研究中, 我们提出一个改良模型, 名为 DeSTSeg, 将一个训练有素的教师网络、 一个取消学生编码分解器的分解网络整合为一个框架。 首先, 为了加强对异常数据的限制, 我们引入了一种分解程序, 使学生网络能够学习更强健的演示。 从合成的正常图像中, 我们培训学生网络, 将相同图像的教师网络特征匹配而没有腐败。 其次, 为了适应性地整合多层次的S- T 特征, 我们培训一个分解网络, 其监管内容丰富, 不受合成异常面具的影响, 实现显著的性能改进。 对工业检查基准数据集的实验表明, 我们的方法达到了水平的性能表现, 98.6% 在图像级平位中, 平均精确度为 。