The graph edit distance is an intuitive measure to quantify the dissimilarity of graphs, but its computation is NP-hard and challenging in practice. We introduce methods for answering nearest neighbor and range queries regarding this distance efficiently for large databases with up to millions of graphs. We build on the filter-verification paradigm, where lower and upper bounds are used to reduce the number of exact computations of the graph edit distance. Highly effective bounds for this involve solving a linear assignment problem for each graph in the database, which is prohibitive in massive datasets. Index-based approaches typically provide only weak bounds leading to high computational costs verification. In this work, we derive novel lower bounds for efficient filtering from restricted assignment problems, where the cost function is a tree metric. This special case allows embedding the costs of optimal assignments isometrically into $\ell_1$ space, rendering efficient indexing possible. We propose several lower bounds of the graph edit distance obtained from tree metrics reflecting the edit costs, which are combined for effective filtering. Our method termed EmbAssi can be integrated into existing filter-verification pipelines as a fast and effective pre-filtering step. Empirically we show that for many real-world graphs our lower bounds are already close to the exact graph edit distance, while our index construction and search scales to very large databases.
翻译:图形编辑距离是量化图表不同之处的直观尺度,但它的计算方法是硬性NP,在实践中具有挑战性。我们引入了对最近的邻居进行回答的方法,并针对具有上百万图的大型数据库高效地对距离进行范围查询。我们以过滤核查范式为基础,使用下限和上界来减少图形编辑距离的精确计算数量。为此,非常有效的界限涉及解决数据库中每个图表的线性分配问题,这在庞大的数据集中是令人窒息的。基于索引的方法通常只能提供导致计算成本高的薄弱界限。在这项工作中,我们从有限的任务问题(成本函数是树度指标)中获取高效过滤的更近距离查询。这个特殊案例允许将最佳任务的成本以直径直径方式嵌入$\ell_1美元的空间,从而可以有效地编制索引。我们从反映编辑成本的树度测量图中获得的几条更低的线。我们称为EmbAsiti的方法可以纳入现有的过滤器-核实成本高的校验路径中。我们称为Embassi的工作可以作为快速和精确的图表,同时显示我们快速和精确的深度的深度的深度的深度的深度,我们可以显示我们所建的深度的深度的深度的深度的深度的深度的深度。