In this paper, we introduce the novel state-of-the-art Dual-attention Transformer and Discriminative Flow (DADF) framework for visual anomaly detection. Based on only normal knowledge, visual anomaly detection has wide applications in industrial scenarios and has attracted significant attention. However, most existing methods fail to meet the requirements. In contrast, the proposed DTDF presents a new paradigm: it firstly leverages a pre-trained network to acquire multi-scale prior embeddings, followed by the development of a vision Transformer with dual attention mechanisms, namely self-attention and memorial-attention, to achieve two-level reconstruction for prior embeddings with the sequential and normality association. Additionally, we propose using normalizing flow to establish discriminative likelihood for the joint distribution of prior and reconstructions at each scale. The DADF achieves 98.3/98.4 of image/pixel AUROC on Mvtec AD; 83.7 of image AUROC and 67.4 of pixel sPRO on Mvtec LOCO AD benchmarks, demonstrating the effectiveness of our proposed approach.
翻译:本文介绍了一种新颖的、最先进的双注意力变压器和鉴别流(DADF)框架,用于视觉异常检测。基于仅具有正常知识,视觉异常检测在工业场景中具有广泛的应用,并引起了极大的关注。 然而,大多数现有方法未能满足要求。与此相反,所提出的DTDF提出了一种新的范例:它首先利用预训练网络获取多尺度先验嵌入,然后开发了一个具有双重注意机制的视觉变压器,即自我关注和纪念关注,以实现与顺序和正常性关联的先前嵌入的两级重构。此外,我们提出使用正则化流来建立每个尺度上的先前嵌入和重构的联合分布的鉴别性似然。DADF在Mvtec AD上实现了98.3/98.4的图像/像素AUROC;在Mvtec LOCO AD标准测试中,图像AUROC为83.7,像素sPRO为67.4,证明了我们提出的方法的有效性。