Subgraph similarity search is a fundamental operator in graph analysis. In this framework, given a query graph and a graph database, the goal is to identify subgraphs of the database graphs that are structurally similar to the query. Subgraph edit distance (SED) is one of the most expressive measures for subgraph similarity. In this work, we study the problem of learning SED from a training set of graph pairs and their SED values. Towards that end, we design a novel siamese graph neural network called NEUROSED, which learns an embedding space with a rich structure reminiscent of SED. With the help of a specially crafted inductive bias, NEUROSED not only enables high accuracy but also ensures that the predicted SED, like true SED, satisfies triangle inequality. The design is generic enough to also model graph edit distance (GED), while ensuring that the predicted GED space is metric, like the true GED space. Extensive experiments on real graph datasets, for both SED and GED, establish that NEUROSED achieves approximately 2 times lower RMSE than the state of the art and is approximately 18 times faster than the fastest baseline. Further, owing to its pair-independent embeddings and theoretical properties, NEUROSED allows approximately 3 orders of magnitude faster retrieval of graphs and subgraphs.
翻译:子图相似性搜索是图形分析的基本操作器。 在这个框架中, 给一个查询图和一个图形数据库, 目标是确定数据库图在结构上与查询相似的子图子。 Subgraph编辑距离( SED) 是子图相似性最清晰的措施之一 。 在这项工作中, 我们研究从一组培训的图形配对及其 SED 值中学习 SED 的问题 。 为此, 我们设计了一个叫做 NEUROSED 的新型图形神经网络, 这个网络学习一个嵌入空间, 其结构与 SED 具有丰富的相似性。 借助一个特别设计的感化偏差, NEUROSED 不仅能够实现高度的精确性, 而且还能确保预测的 SEDD 能够像真正的 SED 一样, 三角不平等性。 这个设计非常通用,足以模型编辑距离( GED), 同时确保预测的GED 空间与真正的 GED 空间一样, 在真实的图形数据集中进行广泛的实验, 确定 NEUROSED 达到大约2次的理论级的精确程度, 。