Many problems in areas as diverse as recommendation systems, social network analysis, semantic search, and distributed root cause analysis can be modeled as pattern search on labeled graphs (also called "heterogeneous information networks" or HINs). Given a large graph and a query pattern with node and edge label constraints, a fundamental challenge is to nd the top-k matches ac- cording to a ranking function over edge and node weights. For users, it is di cult to select value k . We therefore propose the novel notion of an any-k ranking algorithm: for a given time budget, re- turn as many of the top-ranked results as possible. Then, given additional time, produce the next lower-ranked results quickly as well. It can be stopped anytime, but may have to continues until all results are returned. This paper focuses on acyclic patterns over arbitrary labeled graphs. We are interested in practical algorithms that effectively exploit (1) properties of heterogeneous networks, in particular selective constraints on labels, and (2) that the users often explore only a fraction of the top-ranked results. Our solution, KARPET, carefully integrates aggressive pruning that leverages the acyclic nature of the query, and incremental guided search. It enables us to prove strong non-trivial time and space guarantees, which is generally considered very hard for this type of graph search problem. Through experimental studies we show that KARPET achieves running times in the order of milliseconds for tree patterns on large networks with millions of nodes and edges.
翻译:在建议系统、社会网络分析、语义搜索和分布式根源分析等不同领域的许多问题,如建议系统、社会网络分析、语义搜索和分布式根源分析等,都可以以标签图形(也称为“异质信息网络”或 HINs )的图示搜索模式为模型。鉴于一个大图表和带有节点和边缘标签限制的查询模式,一个根本的挑战就是将顶部和直线线连接到边缘和节点重量之间的排序函数。对于用户来说,选择价值 k 是一种分解。因此,我们提出了任何先行排序算法的新概念:对于给定的时间预算,尽可能将最高级的网络结果重新翻转。随后,如果有更多时间,可以很快产生下一个更低级的结果。它可以随时停止,但可能一直持续到所有结果都返回为止。本文侧重于任意贴标图的周期性函数的循环模式。我们感兴趣的是实际算法,能够有效地利用(1) 变异网络的特性,特别是标签上的选择性限制,以及(2) 用户往往只探索最高级的硬值结果的一小部分。在运行过程中,我们的解决方案、快速的搜索周期性研究将使得不断的不断的不断的不断的不断的不断的不断的不断的不断的不断的不断推进。