采用自入职愿景变异器可普遍实现的工业视觉异常探测 (Generalizable Industrial Visual Anomaly Detection with Self-Induction Vision Transformer)

Industrial vision anomaly detection plays a critical role in the advanced intelligent manufacturing process, while some limitations still need to be addressed under such a context. First, existing reconstruction-based methods struggle with the identity mapping of trivial shortcuts where the reconstruction error gap is legible between the normal and abnormal samples, leading to inferior detection capabilities. Then, the previous studies mainly concentrated on the convolutional neural network (CNN) models that capture the local semantics of objects and neglect the global context, also resulting in inferior performance. Moreover, existing studies follow the individual learning fashion where the detection models are only capable of one category of the product while the generalizable detection for multiple categories has not been explored. To tackle the above limitations, we proposed a self-induction vision Transformer(SIVT) for unsupervised generalizable multi-category industrial visual anomaly detection and localization. The proposed SIVT first extracts discriminatory features from pre-trained CNN as property descriptors. Then, the self-induction vision Transformer is proposed to reconstruct the extracted features in a self-supervisory fashion, where the auxiliary induction tokens are additionally introduced to induct the semantics of the original signal. Finally, the abnormal properties can be detected using the semantic feature residual difference. We experimented with the SIVT on existing Mvtec AD benchmarks, the results reveal that the proposed method can advance state-of-the-art detection performance with an improvement of 2.8-6.3 in AUROC, and 3.3-7.6 in AP.

翻译：在先进的智能制造过程中,工业视觉异常现象的探测在先进的智能制造过程中发挥着关键作用,但有些限制仍需在这种背景下加以解决。首先,现有的基于重建的方法在对一些小捷径进行身份测绘方面挣扎,这些小捷径的重建错误差距在正常样品和异常样品之间可辨别出,从而导致检测能力低劣。随后,先前的研究主要集中于反映物体本地语义和忽视全球背景的神经神经网络(CNN)模型,这也导致性能低下。此外,现有的研究遵循的是个人学习方式,即检测模型仅能够使用一种产品,而多种类别的可普遍检测尚未探索。为了应对上述限制,我们建议了一种自上传的视觉变异器(SIVT),用于不受超常控制的多类工业视觉异常检测和地方化。拟议的SIVT首先从事先训练的CNN中提取了歧视性特征,随后又导致了低效性能。然后,自上调的视觉变变异器建议以自我监督的方式重建提取的特性,其中辅助感应添加的感应征符号,用以将原性变变变变变变变的SHR的原的原性磁性图像。最后,可以将原变变变变变变变变变变原的SAL的原的SIM方法用于原的磁性变压式的磁性变压式的磁性变压式的磁性变压式的SV。