Recent advancements in learned index structures propose replacing existing index structures, like B-Trees, with approximate learned models. In this work, we present a unified benchmark that compares well-tuned implementations of three learned index structures against several state-of-the-art "traditional" baselines. Using four real-world datasets, we demonstrate that learned index structures can indeed outperform non-learned indexes in read-only in-memory workloads over a dense array. We also investigate the impact of caching, pipelining, dataset size, and key size. We study the performance profile of learned index structures, and build an explanation for why learned models achieve such good performance. Finally, we investigate other important properties of learned index structures, such as their performance in multi-threaded systems and their build times.
翻译:学习指数结构的最近进展建议用大致的学习模型取代现有指数结构,如B-Trees。在这项工作中,我们提出了一个统一的基准,将三个学习指数结构的完善实施与几个最先进的“传统”基线进行比较。我们利用四个真实世界的数据集,证明学习指数结构确实能够超过在密集阵列中只读的模拟工作量的非学习指数。我们还调查了缓存、管线、数据集大小和关键大小的影响。我们研究了学习指数结构的绩效概况,并解释了学习模型为何能取得这种良好业绩。最后,我们调查了学习指数结构的其他重要特性,例如它们在多读系统中的表现及其构建时间。