Learned index structures have been shown to achieve favorable lookup performance and space consumption compared to their traditional counterparts such as B-trees. However, most learned index studies have focused on the primary indexing setting, where the base data is sorted. In this work, we investigate whether learned indexes sustain their advantage in the secondary indexing setting. We introduce Learned Secondary Index (LSI), a first attempt to use learned indexes for indexing unsorted data. LSI works by building a learned index over a permutation vector, which allows binary search to performed on the unsorted base data using random access. We additionally augment LSI with a fingerprint vector to accelerate equality lookups. We show that LSI achieves comparable lookup performance to state-of-the-art secondary indexes while being up to 6x more space efficient.
翻译:与诸如B-Trees等传统对等机构相比,已展示了学习指数结构,以取得有利的外观性能和空间消耗。然而,大多数学习指数研究都侧重于基础数据分类的基本索引设置。在这项工作中,我们调查学习指数是否保持了它们在二级索引设置方面的优势。我们引入了学习二级指数(LSI),这是首次尝试使用学习指数来编制未分类数据索引。LSI通过在变异矢量上建立一个学习指数来开展工作,从而能够利用随机访问对未分类的基本数据进行二进制搜索。我们增加了使用指纹矢量的LSI,以加速平等调查。我们显示,LSI在达到6x空间效率的同时,取得了与最先进的二级指数可比的查询性能。