Within a large database G containing graphs with labeled nodes and directed, multi-edges; how can we detect the anomalous graphs? Most existing work are designed for plain (unlabeled) and/or simple (unweighted) graphs. We introduce CODETECT, the first approach that addresses the anomaly detection task for graph databases with such complex nature. To this end, it identifies a small representative set S of structural patterns (i.e., node-labeled network motifs) that losslessly compress database G as concisely as possible. Graphs that do not compress well are flagged as anomalous. CODETECT exhibits two novel building blocks: (i) a motif-based lossless graph encoding scheme, and (ii) fast memory-efficient search algorithms for S. We show the effectiveness of CODETECT on transaction graph databases from three different corporations, where existing baselines adjusted for the task fall behind significantly, across different types of anomalies and performance metrics.
翻译:在大型数据库G中,含有带有标签节点和定向、多层格的图表;我们如何探测异常图?大多数现有工作是为普通(未贴标签)和(或)简单(未加权)图表设计的。我们引入了CODETECT,这是处理性质如此复杂的图形数据库异常检测任务的第一个方法。为此,它确定了一组具有代表性的小型结构模式S(即,有标签的网络模型),这种结构模式尽可能简明扼要,无损压缩数据库G。没有压缩的图表标记为异常图。CODETECT展示了两个新型建筑块:(一) 基于无损图案的无损图解编码办法,和(二) S. 我们显示了CODETECT在三个不同公司交易图数据库上的有效性,在这三个公司中,根据任务调整的现有基线大大落后于不同类型异常和性能指标。