许多序列迭代比值可平行和(近近)工作效率 (Many Sequential Iterative Algorithms Can Be Parallel and (Nearly) Work-efficient)

To design efficient parallel algorithms, some recent papers showed that many sequential iterative algorithms can be directly parallelized but there are still challenges in achieving work-efficiency and high-parallelism. Work-efficiency can be hard for certain problems where the number of dependences is asymptotically more than optimal sequential work bound. To achieve high-parallelism, we want to process as many objects as possible in parallel. The goal is to achieve $\tilde{O}(D)$ span for a problem with the deepest dependence length $D$. We refer to this property as round-efficiency. In this paper, we show work-efficient and round-efficient algorithms for a variety of classic problems and propose general approaches to do so. To efficiently parallelize many sequential iterative algorithms, we propose the phase-parallel framework. The framework assigns a rank to each object and processes them accordingly. All objects with the same rank can be processed in parallel. To enable work-efficiency and high parallelism, we use two types of general techniques. Type 1 algorithms aim to use range queries to extract all objects with the same rank, such that we avoid evaluating all the dependences. We discuss activity selection, unlimited knapsack, and more using Type 1 framework. Type 2 algorithms aim to wake up an object when the last object it depends on is finished. We discuss activity selection, longest increasing subsequence (LIS), and many other algorithms using Type 2 framework. All of our algorithms are (nearly) work-efficient and round-efficient. Many of them improve previous best bounds, and some of them are the first to achieve work-efficiency with round-efficiency. We also implement many of them. On inputs with reasonable dependence depth, our algorithms are highly parallelized and significantly outperform their sequential counterparts.

翻译：为了设计高效平行算法,最近的一些论文显示,许多连续迭代算法可以直接平行,但在实现工作效率和高平行方面仍然存在挑战。工作效率对于某些问题来说可能很困难,在这些问题上,依赖性的数量比最佳的相继工作要简单化,比最佳的相继工作捆绑起来要简单化。为了实现高度平行化,我们要同时处理尽可能多的对象。目标是在最深的依赖性长度问题上实现$tilde{O}(D)$x美元跨行。我们把这一属性称为圆效率。在本文中,我们为各种经典问题展示工作效率和全方位效率的全方位算法。我们展示了工作效率高和全方位的全局性算法,我们用最高级的顺序和全局性算法来提取所有对象,我们用最低级和全局性算法来评估所有序列的顺序,我们用最下级的顺序来评估所有对象,最后一级和最下级的逻辑选择,我们用最下级的顺序来进行。