The graph retrieval problem is to search in a large corpus of graphs for ones that are most similar to a query graph. A common consideration for scoring similarity is the maximum common subgraph (MCS) between the query and corpus graphs, usually counting the number of common edges (i.e., MCES). In some applications, it is also desirable that the common subgraph be connected, i.e., the maximum common connected subgraph (MCCS). Finding exact MCES and MCCS is intractable, but may be unnecessary if ranking corpus graphs by relevance is the goal. We design fast and trainable neural functions that approximate MCES and MCCS well. Late interaction methods compute dense representations for the query and corpus graph separately, and compare these representations using simple similarity functions at the last stage, leading to highly scalable systems. Early interaction methods combine information from both graphs right from the input stages, are usually considerably more accurate, but slower. We propose both late and early interaction neural MCES and MCCS formulations. They are both based on a continuous relaxation of a node alignment matrix between query and corpus nodes. For MCCS, we propose a novel differentiable network for estimating the size of the largest connected common subgraph. Extensive experiments with seven data sets show that our proposals are superior among late interaction models in terms of both accuracy and speed. Our early interaction models provide accuracy competitive with the state of the art, at substantially greater speeds.
翻译:图形检索问题在于在大量图表中查找与查询图最相似的图表。对于评分相似性的常见考虑是查询图和数据表之间最常用的子集(MCS),通常计算共同边缘(即MCES)的数量。在某些应用中,共同的子集(即最大通用连接子集(MCCS))最好能够连接起来。找到准确的 MCES和中链S是棘手的,但如果按相关性排列的分数图表是目标,则可能没有必要。我们设计快速和可训练的神经功能,接近 MCES和中链系统。过时的互动方法分别计算查询和系统图的密集表达方式,并使用最后阶段的简单相似功能比较这些表达方式,导致高度可伸缩的系统。早期的互动方法将来自两个图形的右侧两个输入阶段的信息结合起来,通常相当准确,但速度要慢得多。我们建议晚和早期的神经集和中链组合的配制。两者都是基于不断放松的矩阵矩阵矩阵和分数节点节点组合的精确度,我们提议在最晚的模型中进行新的、最高级的更精确的实验。