Graph data management and querying has many practical applications. When graphs are very heterogeneous and/or users are unfamiliar with their structure, they may need to find how two or more groups of nodes are connected in a graph, even when users are not able to describe the connections. This is only partially supported by existing query languages, which allow searching for paths, but not for trees connecting three or more node groups. The latter is related to the NP-hard Group Steiner Tree problem, and has been previously considered for keyword search in databases. In this work, we formally show how to integrate connecting tree patterns (CTPs, in short) within a graph query language such as SPARQL or Cypher, leading to an Extended Query Language (or EQL, in short). We then study a set of algorithms for evaluating CTPs; we generalize prior keyword search work, most importantly by (i) considering bidirectional edge traversal and (ii) allowing users to select any score function for ranking CTP results. To cope with very large search spaces, we propose an efficient pruning technique and formally establish a large set of cases where our algorithm, MOLESP, is complete even with pruning. Our experiments validate the performance of our CTP and EQL evaluation algorithms on a large set of synthetic and real-world workloads.
翻译:图表数据管理和查询有许多实际应用。 当图形非常多样化和(或)用户对其结构不熟悉时, 他们可能需要在图形中找到如何将两个或更多组节点连接在一起, 即使用户无法描述连接。 这只有部分得到现有查询语言的支持, 这些语言允许搜索路径, 而不是连接三个或更多节点组的树。 后者与NP- 硬类 Steiner Tree 问题有关, 并曾考虑在数据库中搜索关键字。 在这项工作中, 我们正式展示如何将连接树型( CTPs, 简称) 整合到像 SPARQL 或 Cypher 这样的图表查询语言中, 导致扩展的Query 语言( 简称 EQL ) 。 我们随后研究一套用于评价 CTPs 的算法; 我们将先前的关键词搜索工作普遍化, 最重要的是 (i) 考虑双向边缘路过, 以及 (ii) 允许用户选择任何CTP 结果的评分数功能。 为了适应非常大的搜索空间, 我们提议一种高效的运行技术, 并正式建立一套我们的QL 级演算系统。