Graph-Level Outlier Detection (GLOD) is the task of identifying unusual graphs within a graph database, which received little attention compared to node-level detection in a single graph. As propagation based graph embedding by GNNs and graph kernels achieved promising results on another graph-level task, i.e. graph classification, we study applying those models to tackle GLOD. Instead of developing new models, this paper identifies and delves into a fundamental and intriguing issue with applying propagation based models to GLOD, with evaluation conducted on repurposed binary graph classification datasets where one class is down-sampled as outlier. We find that ROC-AUC performance of the models change significantly (flips from high to low) depending on which class is down-sampled. Interestingly, ROC-AUCs on these two variants approximately sum to 1 and their performance gap is amplified with increasing propagations. We carefully study the graph embedding space produced by propagation based models and find two driving factors: (1) disparity between within-class densities which is amplified by propagation, and (2) overlapping support (mixing of embeddings) across classes. Our study sheds light onto the effects of using graph propagation based models and classification datasets for outlier detection for the first time.
翻译:图形级外星探测(GLOD)是在一个图形数据库中确定异常图表的任务,该数据库与一个图形中节点检测相比很少引起注意。GNNs和图形内核嵌入的基于传播的图形图形在另一个图形层面的任务(即图解分类)上取得了大有希望的结果。我们研究这些模型处理GLOD。本文没有开发新的模型,而是将基于传播的模型应用于GLOD,查明和冲入一个基本和令人感兴趣的问题,在重新定位的二进制图形分类数据集上进行了评价,在一个类别下标为外部标。我们发现,模型的模型性能变化很大(从高到低的翻转),取决于哪一类是下标。有趣的是,在这两个变量上,ROC-AUSC大约加到1,其性能差距随着传播量的增加而扩大。我们仔细研究基于传播模型生成的图形嵌入空间的图形,并发现两个驱动因素:(1) 类内部密度差异,通过传播和复制模型的分层测量结果,在我们的图像分类中,支持(根据图像的分层的分层)的分层。