It is hard to collect enough flaw images for training deep learning network in industrial production. Therefore, existing industrial anomaly detection methods prefer to use CNN-based unsupervised detection and localization network to achieve this task. However, these methods always fail when there are varieties happened in new signals since traditional end-to-end networks suffer barriers of fitting nonlinear model in high-dimensional space. Moreover, they have a memory library by clustering the feature of normal images essentially, which cause it is not robust to texture change. To this end, we propose the Vision Transformer based (VIT-based) unsupervised anomaly detection network. It utilizes a hierarchical task learning and human experience to enhance its interpretability. Our network consists of pattern generation and comparison networks. Pattern generation network uses two VIT-based encoder modules to extract the feature of two consecutive image patches, then uses VIT-based decoder module to learn the human designed style of these features and predict the third image patch. After this, we use the Siamese-based network to compute the similarity of the generation image patch and original image patch. Finally, we refine the anomaly localization by the bi-directional inference strategy. Comparison experiments on public dataset MVTec dataset show our method achieves 99.8% AUC, which surpasses previous state-of-the-art methods. In addition, we give a qualitative illustration on our own leather and cloth datasets. The accurate segment results strongly prove the accuracy of our method in anomaly detection.
翻译:很难收集足够的瑕疵图像来培训工业生产中的深层学习网络。 因此, 现有的工业异常检测方法倾向于使用基于CNN的、 不受监督的检测和本地化网络来完成这项任务。 但是, 当新信号中出现品种时, 这些方法总是失败, 因为传统的端对端网络在高维空间中遇到安装非线性模型的障碍。 此外, 它们有一个记忆库, 将普通图像的特征基本组合起来, 导致它不适应质谱变化。 为此, 我们建议使用基于( 以VIT为基础的) 的愿景变异器( 以不受监督的异常点探测网络为基础) 。 它使用等级任务学习和人类经验来加强其可解释性。 我们的网络由模式生成和比较网络组成。 模式生成两个基于 端对端网络的编码模块在高空空间中, 利用基于 VIT 的解析器模块来学习这些特征的人类设计风格, 并预测第三个图像补丁。 在此之后, 我们使用基于 Siamese 的网络来理解生成图像补置和原始图像补置的类似性。 它的精准性任务学习和人类原始图像的精准性准确性学习。 最后, 我们的校正缩校正的校正的校正的校正的校正的校正的校正的校略方法, 我们的校正的校正的校正的校正的校正的校正的校正的校正的校正的校正的校对方法, 。