In this work, we focus on large graph similarity computation problem and propose a novel "embedding-coarsening-matching" learning framework, which outperforms state-of-the-art methods in this task and has significant improvement in time efficiency. Graph similarity computation for metrics such as Graph Edit Distance (GED) is typically NP-hard, and existing heuristics-based algorithms usually achieves a unsatisfactory trade-off between accuracy and efficiency. Recently the development of deep learning techniques provides a promising solution for this problem by a data-driven approach which trains a network to encode graphs to their own feature vectors and computes similarity based on feature vectors. These deep-learning methods can be classified to two categories, embedding models and matching models. Embedding models such as GCN-Mean and GCN-Max, which directly map graphs to respective feature vectors, run faster but the performance is usually poor due to the lack of interactions across graphs. Matching models such as GMN, whose encoding process involves interaction across the two graphs, are more accurate but interaction between whole graphs brings a significant increase in time consumption (at least quadratic time complexity over number of nodes). Inspired by large biological molecular identification where the whole molecular is first mapped to functional groups and then identified based on these functional groups, our "embedding-coarsening-matching" learning framework first embeds and coarsens large graphs to coarsened graphs with denser local topology and then matching mechanism is deployed on the coarsened graphs for the final similarity scores. Detailed experiments have been conducted and the results demonstrate the efficiency and effectiveness of our proposed framework.
翻译:在这项工作中,我们侧重于大平面相似计算问题,并提出了一个新的“更精密、更精密和匹配”学习框架,这比本任务中最先进的方法要优于本任务中最先进的方法,且在时间效率方面有显著的提高。像“Great Edit距离”(GED)这样的指标的图表相似性计算通常是NP-hard,而现有的基于超光速的算法通常在准确性和效率之间实现一个不尽人意的权衡。最近深层次学习技术的开发通过数据驱动的“更精密的更精确的计算方法”为这一问题提供了一个有希望的解决方案。数据驱动方法培训一个网络,将图表编码成自己的特性矢量,并基于特性矢量进行相似的计算。这些深层学习方法可以分为两类,嵌入模型和匹配模型。GCN-MEan和GCN-Max等嵌入模型直接绘制了各自的特性矢量矢量的图表,运行速度通常因不同图形之间缺乏互动而差。在GMN(GMN)这样的模型中,其编码过程涉及在两个图表中进行互动,在两个图表中进行互动,更精确的计算, 其内部的计算过程在两个图表中进行更精确的计算结果之间则比较为最精确的精确的计算结果, 而在运行中进行最精确的计算, 和精确的精确的计算, 也就是数组之间则在其中, 和精确的深度的计算, 和精确的计算, 和精确的计算, 和深度的计算过程在后,在大的变数组中进行到整个的计算过程的精度是最小化到整个的计算。在大的变数组中, 。