Index plays an essential role in modern database engines to accelerate the query processing. The new paradigm of "learned index" has significantly changed the way of designing index structures in DBMS. The key insight is that indexes could be regarded as learned models that predict the position of a lookup key in the dataset. While such studies show promising results in both lookup time and index size, they cannot efficiently support update operations. Although recent studies have proposed some preliminary approaches to support update, they are at the cost of scarifying the lookup performance as they suffer from the overheads brought by imprecise predictions in the leaf nodes. In this paper, we propose LIPP, a brand new framework of learned index to address such issues. Similar with state-of-the-art learned index structures, LIPP is able to support all kinds of index operations, namely lookup query, range query, insert, delete, update and bulkload. Meanwhile, we overcome the limitations of previous studies by properly extending the tree structure when dealing with update operations so as to eliminate the deviation of location predicted by the models in the leaf nodes. Moreover, we further propose a dynamic adjustment strategy to ensure that the height of the tree index is tightly bounded and provide comprehensive theoretical analysis to illustrate it. We conduct an extensive set of experiments on several real-life and synthetic datasets. The results demonstrate that our method consistently outperforms state-of-the-art solutions, achieving by up to 4x for a broader class of workloads with different index operations.
翻译:在现代数据库引擎中,“累积指数”的新范式大大改变了DBMS中设计索引结构的方式。关键的洞察力是,指数可以被视为预测数据集中查找键位置的学习模型。虽然这些研究显示在查找时间和索引大小方面都取得了有希望的结果,但它们无法有效地支持更新操作。虽然最近的研究提出了一些支持更新的初步方法,但它们的代价是,由于叶节点中不准确的预测导致的间接成本使查找业绩受到不利影响。在本文中,我们建议建立一个创新的学习指数框架,以解决这些问题。类似于最先进的学习指数结构,LIPP能够支持所有类型的指数操作,即查找查询、范围查询、插入、删除、更新和大负荷。与此同时,我们克服了以往研究的局限性,在处理更新操作时适当扩展了树结构,以便消除叶节中模型预测的地点偏差。此外,我们进一步提议了一个动态调整战略,即为解决这些问题,与最先进的指数结构相比,我们用一系列的理论模型来展示了一种长期的模型模型式分析结果。