In this paper, we present FLiMS, a highly-efficient and simple parallel algorithms for merging two sorted lists residing in banked and/or wide memory. On FPGAs, its implementation uses fewer hardware resources than the state-of-the-art alternatives, due to the reduced number of comparators and elimination of redundant logic found on prior attempts. In combination with the distributed nature of the selector stage, a higher performance is achieved for the same amount of parallelism or higher. This is useful in many applications such as in parallel merge trees to achieve high-throughput sorting, where the resource utilisation of the merger is critical for building larger trees and internalising the workload for faster computation. Also presented are efficient variations of FLiMS for optimizing throughput for skewed datasets, achieving stable sorting or using fewer dequeue signals. FLiMS is also shown to perform well as conventional software on modern CPUs supporting single-instruction multiple-data (SIMD) instructions, surpassing the performance of some standard libraries for sorting.
翻译:本文介绍FLIMS, 这是一种高效和简单的平行算法, 用于合并存放在银行和(或)宽度记忆中的两种分类列表。 在FPGAs上, 其实施使用硬件资源少于最先进的替代方法, 原因是参照国数量减少, 消除了先前尝试中发现的重复逻辑。 结合选择阶段的分布性质, 在相同数量的平行或更高程度上实现了更高的性能。 这在许多应用中非常有用, 比如在平行合并树上实现高通量排序, 合并的资源利用对于建造更大的树木和内部化工作量以更快的计算至关重要。 还介绍了FLIMS在优化偏斜数据集的吞吐量、实现稳定排序或使用较少的脱结信号方面的有效变化。 FLIMS还展示了支持单线多数据( SIMD) 指示的现代计算机软件的运行以及常规软件, 超过了某些标准图书馆进行分类的性能。