Leyenda:具有有限内存的大型数据适应性混合混合排序算法 (Leyenda: An Adaptive, Hybrid Sorting Algorithm for Large Scale Data with Limited Memory)

Sorting is the one of the fundamental tasks of modern data management systems. With Disk I/O being the most-accused performance bottleneck and more computation-intensive workloads, it has come to our attention that in heterogeneous environment, performance bottleneck may vary among different infrastructure. As a result, sort kernels need to be adaptive to changing hardware conditions. In this paper, we propose Leyenda, a hybrid, parallel and efficient Radix Most-Significant-Bit (MSB) MergeSort algorithm, with utilization of local thread-level CPU cache and efficient disk/memory I/O. Leyenda is capable of performing either internal or external sort efficiently, based on different I/O and processing conditions. We benchmarked Leyenda with three different workloads from Sort Benchmark, targeting three unique use cases, including internal, partially in-memory and external sort, and we found Leyenda to outperform GNU's parallel in-memory quick/merge sort implementations by up to three times. Leyenda is also ranked the second best external sort algorithm on ACM 2019 SIGMOD programming contest and forth overall.

翻译：排序是现代数据管理系统的基本任务之一。磁盘 I/ O 是最受指控的性能瓶颈和更多计算密集的工作量,我们注意到,在不同不同的环境中,性能瓶颈在不同的基础设施中可能有所不同。因此, 分类内核需要适应硬件条件的变化。在本文中, 我们提议使用Leyenda, 一种混合、平行和有效的 Radix 最有威望- 比特( MSB) 合并算法, 利用本地的线性 CPU 缓存和高效的磁盘/ 模拟 I/ O. Leyenda 能够根据不同的 I/ O 和处理条件, 高效地进行内部或外部类型的工作。我们为Leyenda 设定了三个不同的工作量基准, 分别针对三个独特的使用案例, 包括内部、部分的模量和外部类型的案例, 我们发现Leyenda 将GNU 的平行的快速/ 组合类集执行率超过三次。 Leyenda 也排在 ASMM 19 和 SIADA 上将的第二个最佳外部编程排序排在 2019 上。