Recent works on learned index open a new direction for the indexing field. The key insight of the learned index is to approximate the mapping between keys and positions with piece-wise linear functions. Such methods require partitioning key space for a better approximation. Although lots of heuristics are proposed to improve the approximation quality, the bottleneck is that the segmentation overheads could hinder the overall performance. This paper tackles the approximation problem by applying a \textit{distribution transformation} to the keys before constructing the learned index. A two-stage Normalizing-Flow-based Learned index framework (NFL) is proposed, which first transforms the original complex key distribution into a near-uniform distribution, then builds a learned index leveraging the transformed keys. For effective distribution transformation, we propose a Numerical Normalizing Flow (Numerical NF). Based on the characteristics of the transformed keys, we propose a robust After-Flow Learned Index (AFLI). To validate the performance, comprehensive evaluations are conducted on both synthetic and real-world workloads, which shows that the proposed NFL produces the highest throughput and the lowest tail latency compared to the state-of-the-art learned indexes.
翻译:学习指数的近期工作为索引字段打开了新方向。 学习指数的关键洞察力是将关键和位置的绘图与片面线性功能相近。 这种方法需要分割关键空间, 以便更接近。 虽然提出了许多超自然学来提高近似质量, 但瓶颈在于分割间接成本可能会阻碍总体性能。 本文在构建学习指数之前通过对键键应用\ textit{ 分布转换, 来解决近似问题。 提出了两阶段标准化- 低自然科学指数框架( NFL ), 首先将原始复杂关键分布转换为近统一分布, 然后建立利用已变键的学习指数。 为了有效分布转型, 我们提议采用数字性正常流动( Numerical NF) 。 根据变键的特性, 我们提出一个强大的事后学习指数( AFLI ) 。 为了验证业绩, 对合成和真实世界工作量都进行了全面评价, 这表明拟议的NFLFLD生成了最高量和最低尾部衬, 与所学的指数相比, 。