We investigate the enumeration of top-k answers for conjunctive queries against relational databases according to a given ranking function. The task is to design data structures and algorithms that allow for efficient enumeration after a preprocessing phase. Our main contribution is a novel priority queue based algorithm with near-optimal delay and non-trivial space guarantees that are output sensitive and depend on structure of the query. In particular, we exploit certain desirable properties of ranking functions that frequently occur in practice and degree information in the database instance, allowing for efficient enumeration. We introduce the notion of {\em decomposable} and {\em compatible} ranking functions in conjunction with query decomposition, a property that allows for partial aggregation of tuple scores in order to efficiently enumerate the ranked output. We complement the algorithmic results with lower bounds justifying why certain assumptions about properties of ranking functions are necessary and discuss popular conjectures providing evidence for optimality of enumeration delay guarantees. Our results extend and improve upon a long line of work that has studied ranked enumeration from both theoretical and practical perspective.
翻译:我们根据给定的排序函数调查用于对关系数据库进行合并查询的顶点答案。 任务是设计数据结构和算法, 以便在预处理阶段后有效地进行查点。 我们的主要贡献是一种新的基于队列的优先算法, 具有近于最佳的延迟和非三角空间的保证, 具有产出敏感性, 取决于查询的结构。 特别是, 我们利用实践中经常出现的排序功能的某些可取属性和数据库中的程度信息, 允许有效查点。 我们引入了 ~em dicombable} 和 {em 兼容} 的排序功能概念, 与查询分解功能相结合, 这是一种属性, 允许部分汇总 Tuple 分数, 以便有效地列出排名产出。 我们补充算法的结果, 以较低的界限来说明为什么有必要对排序函数属性的某些假设, 并讨论流行的推论, 以证明调延迟的保证的最佳性。 我们的结果在从理论和实践角度研究分级的长线工作上扩大并改进了。