基于PRAM的工作高效查询评估 (Work-Efficient Query Evaluation with PRAMs)

The article studies query evaluation in parallel constant time in the CRCW PRAM model. While it is well-known that all relational algebra queries can be evaluated in constant time on an appropriate CRCW PRAM model, this article is interested in the efficiency of evaluation algorithms, that is, in the number of processors or, asymptotically equivalent, in the work. Naive evaluation in the parallel setting results in huge (polynomial) bounds on the work of such algorithms and in presentations of the result sets that can be extremely scattered in memory. The article discusses some obstacles for constant-time PRAM query evaluation. It presents algorithms for relational operators and explores three settings, in which efficient sequential query evaluation algorithms exist: acyclic queries, semijoin algebra queries, and join queries -- the latter in the worst-case optimal framework. Under mild assumptions -- that data values are numbers of polynomial size in the size of the database or that the relations of the database are suitably sorted -- constant-time algorithms are presented that are weakly work-efficient in the sense that work $\mathcal{O}(T^{1+\varepsilon})$ can be achieved, for every $\varepsilon>0$, compared to the time $T$ of an optimal sequential algorithm. Important tools are the algorithms for approximate prefix sums and compaction from Goldberg and Zwick (1995).

翻译：本文研究了CRCW PRAM模型中的并行常数时间查询评估问题。尽管已知所有关系代数查询均可在适当的CRCW PRAM模型中以常数时间完成评估，但本文关注评估算法的效率问题，即处理器数量（或渐近等价的工作量）。在并行环境下进行朴素评估会导致此类算法的工作量呈现巨大（多项式级）上界，且结果集在内存中的分布可能极度分散。本文探讨了常数时间PRAM查询评估面临的若干障碍，提出了关系运算符的并行算法，并研究了三种存在高效顺序查询评估算法的场景：无环查询、半连接代数查询以及最坏情况最优框架下的连接查询。在温和假设下（数据值为数据库规模的多项式大小数值，或数据库关系经过适当排序），本文提出的常数时间算法实现了弱工作高效性，即对于任意ε>0，其工作量可达O(T^(1+ε))，其中T为最优顺序算法的时间复杂度。核心工具为Goldberg与Zwick（1995）提出的近似前缀和算法与压缩算法。