Join query evaluation with ordering is a fundamental data processing task in relational database management systems. SQL and custom graph query languages such as Cypher offer this functionality by allowing users to specify the order via the ORDER BY clause. In many scenarios, the users also want to see the first $k$ results quickly (expressed by the LIMIT clause), but the value of $k$ is not predetermined as user queries are arriving in an online fashion. Recent work has made considerable progress in identifying optimal algorithms for ranked enumeration of join queries that do not contain any projections. In this paper, we initiate the study of the problem of enumerating results in ranked order for queries with projections. Our main result shows that for any acyclic query, it is possible to obtain a near-linear (in the size of the database) delay algorithm after only a linear time preprocessing step for two important ranking functions: sum and lexicographic ordering. For a practical subset of acyclic queries known as star queries, we show an even stronger result that allows a user to obtain a smooth tradeoff between faster answering time guarantees using more preprocessing time. Our results are also extensible to queries containing cycles and unions. We also perform a comprehensive experimental evaluation to demonstrate that our algorithms, which are simple to implement, improve up to three orders of magnitude in the running time over state-of-the-art algorithms implemented within open-source RDBMS and specialized graph databases.
翻译:与订单连接的查询评估是关系数据库管理系统中的基本数据处理任务。 SQL 和 Cypher 等自定义图形查询语言( SQL 和 Cypher ) 提供了这一功能,允许用户通过 ORDER 条款指定顺序。 在许多情形中,用户还希望快速看到第一批K美元结果( 由 LIMIT 条款表示), 但用户查询以在线方式到达时, 并没有预先确定美元的价值。 最近的工作在确定对合并查询进行分级的最佳算法方面取得了相当大的进展。 在本文件中,我们开始研究将结果按照排序顺序进行计算以进行预测。 我们的主要结果显示,对于任何周期查询,只要有一个线性( 数据库大小), 就可以在两个重要排序函数( 总数和 词汇顺序排序) 的线性预处理步骤后获得一个近线性的延迟算法。 对于被称为星质查询的周期性查询,我们展示了一个更强有力的结果, 使用户能够利用更精确的预处理时间来计算结果。 我们的主要结果还显示, 运行的序列周期将执行一个简单的实验性排序。