Anomaly detection plays a vital role in industrial manufacturing. Due to the scarcity of real defect images, unsupervised approaches that rely solely on normal images have been extensively studied. Recently, diffusion-based generative models brought attention to training data synthesis as an alternative solution. In this work, we focus on a strategy to effectively leverage synthetic images to maximize the anomaly detection performance. Previous synthesis strategies are broadly categorized into two groups, presenting a clear trade-off. Rule-based synthesis, such as injecting noise or pasting patches, is cost-effective but often fails to produce realistic defect images. On the other hand, generative model-based synthesis can create high-quality defect images but requires substantial cost. To address this problem, we propose a novel framework that leverages a pre-trained text-guided image-to-image translation model and image retrieval model to efficiently generate synthetic defect images. Specifically, the image retrieval model assesses the similarity of the generated images to real normal images and filters out irrelevant outputs, thereby enhancing the quality and relevance of the generated defect images. To effectively leverage synthetic images, we also introduce a two stage training strategy. In this strategy, the model is first pre-trained on a large volume of images from rule-based synthesis and then fine-tuned on a smaller set of high-quality images. This method significantly reduces the cost for data collection while improving the anomaly detection performance. Experiments on the MVTec AD dataset demonstrate the effectiveness of our approach.
翻译:异常检测在工业制造中发挥着至关重要的作用。由于真实缺陷图像的稀缺性,仅依赖正常图像的无监督方法得到了广泛研究。近期,基于扩散的生成模型使训练数据合成作为一种替代解决方案受到关注。在本工作中,我们专注于一种有效利用合成图像以最大化异常检测性能的策略。先前的合成策略大致可分为两类,呈现出明显的权衡关系。基于规则的合成(例如注入噪声或粘贴图像块)成本效益高,但往往无法生成逼真的缺陷图像。另一方面,基于生成模型的合成可以创建高质量的缺陷图像,但需要大量成本。为解决此问题,我们提出了一种新颖的框架,该框架利用预训练的文本引导图像到图像翻译模型和图像检索模型来高效生成合成缺陷图像。具体而言,图像检索模型评估生成图像与真实正常图像的相似度,并过滤掉不相关的输出,从而提升生成缺陷图像的质量和相关性。为了有效利用合成图像,我们还引入了一种两阶段训练策略。在此策略中,模型首先在基于规则合成的大量图像上进行预训练,然后在较小规模的高质量图像集上进行微调。该方法在显著降低数据收集成本的同时,提升了异常检测性能。在MVTec AD数据集上的实验证明了我们方法的有效性。