Learned indexes like the recursive model index (RMI) have recently been introduced as a machine-learned replacement for traditional indexes with possibly game-changing results for how database indexes are constructed and used. Has the time come to discard our good old hand-crafted index structures that have been invented over the past decades? We believe that such a bold claim -- with substantial impact on the database world -- is worth a deep examination that clarifies when RMIs have benefits and when not. We present the first inventor-independent study critically examining RMIs. To do so, we revisit the original paper and carefully reimplemented RMIs. We proceed by reproducing the most important experiments from the original paper and follow-up papers all involving the inventors. We extend the original experiments by adding more baselines and considering more configurations. Further, we give insight on why and when RMIs perform well. Our results show that while the general observation of the original work that "any index is a model of the underlying data" is truly inspiring, some conclusions drawn in the original work may mislead database architects to take unfortunate and too radical design decisions. In particular, we show that other types of indexes outperform RMIs in some situations. In addition, we will show that the performance of RMIs is surprisingly sensitive to different data distributions. We conclude by giving a clear guideline for database architects when to use RMIs, other learned indexes, or traditional indexes.
翻译:最近引入了诸如累进模型指数(RMI)等指数,作为传统指数的机械学习替代,并有可能对数据库指数的构建和使用产生游戏变化的结果。现在是否应该抛弃过去几十年发明的老旧手工艺索引结构?我们认为,这种大胆的主张 -- -- 对数据库世界有重大影响 -- -- 值得深入审查,当RMI有益处时,当它没有好处时,就应该予以澄清。我们提出了第一次独立发明人的研究,认真审查RMI。为了这样做,我们重新审视原始文件,并认真重新实施RMI。我们继续从最初的文件和后续文件中复制最重要的实验,所有发明者都参与其中。我们扩大最初的实验,方法是增加更多的基线并考虑更多的配置。此外,我们深入了解RMI工作的原因和何时表现良好。我们的结果显示,虽然对最初工作的观察“任何指数都是基础数据的模型”确实令人振奋人心,但最初工作中得出的一些结论可能误导数据库设计师做出一些不幸和过于激进的设计决定。特别是,我们通过增加一些传统的指数来扩展最初的实验,我们用不同的指数来得出其他的指数的模版。