Targeting in-memory one-dimensional search keys, we propose a novel DIstribution-driven Learned Index tree (DILI), where a concise and computation-efficient linear regression model is used for each node. An internal node's key range is equally divided by its child nodes such that a key search enjoys perfect model prediction accuracy to find the relevant leaf node. A leaf node uses machine learning models to generate searchable data layout and thus accurately predicts the data record position for a key. To construct DILI, we first build a bottom-up tree with linear regression models according to global and local key distributions. Using the bottom-up tree, we build DILI in a top-down manner, individualizing the fanouts for internal nodes according to local distributions. DILI strikes a good balance between the number of leaf nodes and the height of the tree, two critical factors of key search time. Moreover, we design flexible algorithms for DILI to efficiently insert and delete keys and automatically adjust the tree structure when necessary. Extensive experimental results show that DILI outperforms the state-of-the-art alternatives on different kinds of workloads.
翻译:针对内存中一维搜索关键字,我们提出了一种新颖的DIstribution-driven Learned Index树(DILI),其中每个节点都采用简洁和计算高效的线性回归模型。内部节点的关键字范围由其子节点平均分配,以便键搜索可以完美地模型预测准确地找到相关的叶节点。叶子节点使用机器学习模型生成可搜索的数据布局,因此可以准确地预测关键字的数据记录位置。为了构建DILI,我们首先根据全局和局部关键字分布构建了一个自下而上的带有线性回归模型的树。利用自下而上的树,我们自上而下地构建DILI,根据局部分布个性化调整内部节点的扇出。DILI在关键字搜索时间的两个关键因素——叶节点数和树的高度之间取得了良好的平衡。此外,我们为DILI设计了灵活的算法,以便在需要时高效地插入和删除关键字,并自动调整树结构。大量的实验结果表明,在不同类型的工作负载下,DILI优于现有的最先进的替代方法。