Finding the graphs that are most similar to a query graph in a large database is a common task with various applications. A widely-used similarity measure is the graph edit distance, which provides an intuitive notion of similarity and naturally supports graphs with vertex and edge attributes. Since its computation is NP-hard, techniques for accelerating similarity search have been studied extensively. However, index-based approaches for this are almost exclusively designed for graphs with categorical vertex and edge labels and uniform edit costs. We propose a filter-verification framework for similarity search, which supports non-uniform edit costs for graphs with arbitrary attributes. We employ an expensive lower bound obtained by solving an optimal assignment problem. This filter distance satisfies the triangle inequality, making it suitable for acceleration by metric indexing. In subsequent stages, assignment-based upper bounds are used to avoid further exact distance computations. Our extensive experimental evaluation shows that a significant runtime advantage over both a linear scan and state-of-the-art methods is achieved.
翻译:在大型数据库中查找与查询图最相似的图形是各种应用的共同任务。一个广泛使用的相似度测量是图形编辑距离,它提供了近似性和自然支持具有顶点和边缘属性的图形的直观概念。由于它的计算是NP硬的,所以已经对加速相似性搜索的技术进行了广泛的研究。然而,基于索引的方法几乎是专门为带有绝对顶点、边缘标签和统一编辑成本的图形设计的。我们建议为相似性搜索设计一个过滤核查框架,用于支持具有任意属性的图形的非统一编辑费用。我们采用一种通过解决最佳分配问题而获得的较低约束成本。这种过滤距离满足三角间的不平等,使之适合通过衡量索引加速。在以后的阶段,使用基于分配的上层来避免进一步的精确距离计算。我们广泛的实验评估显示,在线扫描和状态方法上都实现了巨大的运行时间优势。