Graph similarity computation is one of the core operations in many graph-based applications, such as graph similarity search, graph database analysis, graph clustering, etc. Since computing the exact distance/similarity between two graphs is typically NP-hard, a series of approximate methods have been proposed with a trade-off between accuracy and speed. Recently, several data-driven approaches based on neural networks have been proposed, most of which model the graph-graph similarity as the inner product of their graph-level representations, with different techniques proposed for generating one embedding per graph. However, using one fixed-dimensional embedding per graph may fail to fully capture graphs in varying sizes and link structures, a limitation that is especially problematic for the task of graph similarity computation, where the goal is to find the fine-grained difference between two graphs. In this paper, we address the problem of graph similarity computation from another perspective, by directly matching two sets of node embeddings without the need to use fixed-dimensional vectors to represent whole graphs for their similarity computation. The model, GraphSim, achieves the state-of-the-art performance on four real-world graph datasets under six out of eight settings (here we count a specific dataset and metric combination as one setting), compared to existing popular methods for approximate Graph Edit Distance (GED) and Maximum Common Subgraph (MCS) computation.
翻译:图形相似性计算是许多基于图形的应用中的核心操作之一,例如图形相似性搜索、图形数据库分析、图表群集等。由于计算两个图形之间的准确距离/相似性通常是NP-硬度,因此提出了一系列近似方法,在精确性和速度之间取舍。最近,提出了以神经网络为基础的若干数据驱动方法,其中多数以图形水平表示的内产物为图形-图相似性模型,建议采用不同技术生成一个嵌入每个图形。然而,使用一个固定维嵌入每个图形可能无法完全捕捉不同大小和链接结构的图表,这一限制对于图形相似性计算任务特别成问题,因为此任务的目标是在精确度和速度之间找到细差。在本文件中,我们从另一个角度处理图形相似性计算问题,直接匹配两套不偏差嵌入式嵌入,而无需使用固定尺寸矢量来代表整个图形进行类似计算。模型、图形SimpetSim、在图形相似度计算任务中实现一个状态-位置的图状图状图状阵列(在常规的图表中,我们现有图表中,将一个直径的直径比的图表比的图表比的图表比的图表比的G)在四个的图表模型中,我们现有图表的基数矩阵中,用于现有图表的基数的基数的基数。