Anomaly detection in computer vision is the task of identifying images which deviate from a set of normal images. A common approach is to train deep convolutional autoencoders to inpaint covered parts of an image and compare the output with the original image. By training on anomaly-free samples only, the model is assumed to not being able to reconstruct anomalous regions properly. For anomaly detection by inpainting we suggest it to be beneficial to incorporate information from potentially distant regions. In particular we pose anomaly detection as a patch-inpainting problem and propose to solve it with a purely self-attention based approach discarding convolutions. The proposed Inpainting Transformer (InTra) is trained to inpaint covered patches in a large sequence of image patches, thereby integrating information across large regions of the input image. When training from scratch, in comparison to other methods not using extra training data, InTra achieves results on par with the current state-of-the-art on the MVTec AD dataset for detection and surpassing them on segmentation.
翻译:计算机视觉中的异常探测是查明不同于一套正常图像的图像的任务。 一种常见的做法是训练深层进化自动解析器, 将图像覆盖的部分涂成油漆, 并将输出与原始图像进行比较。 仅通过非异常样本培训, 模型假定无法正确重建异常区域。 为了通过涂漆检测异常点, 我们建议将潜在遥远区域的信息纳入异常点。 特别是, 我们将异常点检测作为一个修补问题, 并提议用纯粹基于自我注意的方法丢弃共产物来解决它。 拟议的 Inpaint 变换器( InTra) 受过培训, 以大量图像补补补补补补补补补补补补补补补补补补补补补补补补补补补补补补补补补补补补补补补补补补补补补, 从而整合输入图像大区域的信息。 与不使用额外培训数据的其他方法相比, InTra 与目前有关MVTec AD数据集的状态相比, 取得效果。