An alias table is a data structure that allows for efficiently drawing weighted random samples in constant time and can be constructed in linear time. The PSA algorithm by H\"ubschle-Schneider and Sanders is able to construct alias tables in parallel on the CPU. In this report, we transfer the PSA algorithm to the GPU. Our construction algorithm achieves a speedup of 17 on a consumer GPU in comparison to the PSA method on a 16-core high-end desktop CPU. For sampling, we achieve an up to 24 times higher throughput. Both operations also require several times less energy than on the CPU. Adaptations helping to achieve this include changing memory access patterns to do coalesced access. Where this is not possible, we first copy data to the faster shared memory using coalesced access. We also enhance a generalization of binary search enabling to search for a range of items in parallel. Besides naive sampling, we also give improved batched sampling algorithms.
翻译:化名表是一个数据结构,它允许在固定时间高效地抽取加权随机样本,并且可以以线性时间构建。 H\"ubschle-Schneider和Sanders的PSA算法能够平行地在CPU上建立别名表。我们在本报告中将PSA算法转到了GPU。我们的建设算法在消费性GPU上实现了17个加速,在16个核心高端桌面CPU上与PSA方法相比较。在取样方面,我们达到了最高24倍的通过量。两种操作也需要比CPU多几倍的能量。为了实现这一点,我们所需要的适应包括改变内存访问模式,以便进行联结访问。在无法这样做的情况下,我们首先将数据复制到使用煤化访问的更快的共享记忆中。我们还加强了能够同时搜索一系列项目的二元搜索的常规化。除了天性取样之外,我们还改进了分批抽样算法。