Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable parallel algorithms for building data structures that support sampling single items (alias tables, compressed data structures). This also yields a simplified and more space-efficient sequential algorithm for alias table construction. Our approaches to sampling $k$ out of $n$ items with/without replacement and to subset (Poisson) sampling are output-sensitive, i.e., the sampling algorithms use work linear in the number of different samples. This is also interesting in the sequential case. Weighted random permutation can be done by sorting appropriate random deviates. We show that this is possible with linear work using a nonlinear transformation of these deviates. Finally, we give a communication-efficient, highly scalable approach to (weighted and unweighted) reservoir sampling. This algorithm is based on a fully distributed model of streaming algorithms that might be of independent interest. Experiments for alias tables and sampling with replacement show near linear speedups both for construction and queries using up to 158 threads of shared-memory machines. An experimental evaluation of distributed weighted reservoir sampling on up to 256 nodes (5120 cores) also shows good speedups.
翻译:从一组加权项目进行有效抽样的数据结构是许多应用的重要基石。然而,很少有平行的解决办法。我们缩小了共享模版和分布模版机器的许多差距。我们为建立支持抽样单个项目的数据结构提供了高效、快速和实际可行的平行算法(别类表格、压缩数据结构)。这也为别类表格的构造提供了简化和更加空间高效的顺序算法。我们从120美元项目中抽取120美元项目(用不替换或不替换)和子集(Poisson)抽样的方法是注重产出的,即抽样算法在不同样本中使用线性的工作算法。这在相继的案例中也很有意思。通过对适当的随机偏差进行排序,可以进行加权随机随机调整。我们表明,如果用非线性转换这些偏差就可以做到这一点。最后,我们给(加权和不加权)储油层取样提供一种高效、高度可缩放的通信方法。这种算法是以分布得完全的流算法模型为基础,在不同的样本数量上使用线性快速的流算法,可能具有独立的兴趣。在顺序上,也可以进行加权的随机随机随机随机随机随机调整。对模型进行实验式的取样的取样进行实验,对等的样本进行模拟的模拟的模型进行模拟的模拟的模拟的模拟。对正对正对正对正对正对正对正对正对等式的模型进行实验,对正对等式的模型进行实验,对正对正对正式式式式式的模型进行。对正对正对正对正对正对正对正压式的模型进行。对正对正对正式的模型进行实验,对准,对正对正对准,对准,对准,对准,对正式对正式对正对正对正对正对正对正式对正对正对正对正对正对正对正对正对正对正。对正对正对正。对正。对正对正对正式对正式对正式对正式对正式对正式对正对正对正式对正对正对正对正对正对正式对正式对正式对正式对正对正对正式对正式对正式对正式对正式对正对正对正式对正对正对正对正