A large number of studies on Graph Outlier Detection (GOD) have emerged in recent years due to its wide applications, in which Unsupervised Node Outlier Detection (UNOD) on attributed networks is an important area. UNOD focuses on detecting two kinds of typical outliers in graphs: the structural outlier and the contextual outlier. Most existing works conduct experiments based on datasets with injected outliers. However, we find that the most widely-used outlier injection approach has a serious data leakage issue. By only utilizing such data leakage, a simple approach can achieve state-of-the-art performance in detecting outliers. In addition, we observe that existing algorithms have a performance drop with the mitigated data leakage issue. The other major issue is on balanced detection performance between the two types of outliers, which has not been considered by existing studies. In this paper, we analyze the cause of the data leakage issue in depth since the injection approach is a building block to advance UNOD. Moreover, we devise a novel variance-based model to detect structural outliers, which outperforms existing algorithms significantly and is more robust at kinds of injection settings. On top of this, we propose a new framework, Variance based Graph Outlier Detection (VGOD), which combines our variance-based model and attribute reconstruction model to detect outliers in a balanced way. Finally, we conduct extensive experiments to demonstrate the effectiveness and efficiency of VGOD. The results on 5 real-world datasets validate that VGOD achieves not only the best performance in detecting outliers but also a balanced detection performance between structural and contextual outliers.
翻译:近年来,由于应用范围很广,在配给网络上进行不受监督的节向外探测(UNOD)是一个重要领域。UNOD侧重于在图表中发现两种典型的外部值:结构外部值和背景外部值。大多数现有工作以注入外部值的数据集为基础进行实验。然而,我们发现,最广泛使用的外源喷入方法有严重的数据渗漏问题。只有利用这种数据渗漏,一个简单的方法才能在检测外部值方面达到最先进的性能。此外,我们发现现有的算法随着数据渗漏问题减轻而出现性能下降。另一个主要问题是两种外源值之间的均衡检测性能,两种外源值都没有得到现有研究的考虑。在本文中,我们分析了注射方法以来数据渗漏问题的深层原因,只是推进UNOD的一个基石。此外,我们设计了一个新的基于差异的检测模型来检测结构外部值的模型,这种结构外源值比现有的算法要差很多。此外,我们发现现有的数据渗漏率随着数据流流出结构结果的升级,我们最终将一个用于检测结果的顶端值框架。