We propose Gumbel Noise Score Matching (GNSM), a novel unsupervised method to detect anomalies in categorical data. GNSM accomplishes this by estimating the scores, i.e. the gradients of log likelihoods w.r.t.~inputs, of continuously relaxed categorical distributions. We test our method on a suite of anomaly detection tabular datasets. GNSM achieves a consistently high performance across all experiments. We further demonstrate the flexibility of GNSM by applying it to image data where the model is tasked to detect poor segmentation predictions. Images ranked anomalous by GNSM show clear segmentation failures, with the outputs of GNSM strongly correlating with segmentation metrics computed on ground-truth. We outline the score matching training objective utilized by GNSM and provide an open-source implementation of our work.
翻译:我们提出了一种新的无监督方法Gumbel Noise Score Matching (GNSM),用于检测分类数据中的异常值。 GNSM通过估计连续松弛分类分布的得分(即相对于输入的对数似然梯度)来实现这一点。我们在一系列异常检测表格数据集上测试了我们的方法。在所有实验中,GNSM都实现了持续高水平的性能。我们进一步证明GNSM的灵活性,通过将其应用于图像数据,其中模型的任务是检测糟糕的分割预测。 GNSM对异常的图像显示出明显的分割失误,GNSM的输出与基于地面实况数据计算的分割指标强烈相关。我们概述了GNSM使用的得分匹配训练目标,并提供了我们工作的开源实现。