The recursive model index (RMI) has recently been introduced as a machine-learned replacement for traditional indexes over sorted data, achieving remarkably fast lookups. Follow-up work focused on explaining RMI's performance and automatically configuring RMIs through enumeration. Unfortunately, configuring RMIs involves setting several hyperparameters, the enumeration of which is often too time-consuming in practice. Therefore, in this work, we conduct the first inventor-independent broad analysis of RMIs with the goal of understanding the impact of each hyperparameter on performance. In particular, we show that in addition to model types and layer size, error bounds and search algorithms must be considered to achieve the best possible performance. Based on our findings, we develop a simple-to-follow guideline for configuring RMIs. We evaluate our guideline by comparing the resulting RMIs with a number of state-of-the-art indexes, both learned and traditional. We show that our simple guideline is sufficient to achieve competitive performance with other learned indexes and RMIs whose configuration was determined using an expensive enumeration procedure. In addition, while carefully reimplementing RMIs, we are able to improve the build time by 2.5x to 6.3x.
翻译:最近引入了递归模型指数(RMI),作为传统指数相对于分类数据的传统指数的机械化替代,取得了显著的快速检查。后续工作的重点是解释RMI的性能,通过点算自动配置RMI。不幸的是,RMI的配置涉及设置数个超参数,其查点在实践中往往太费时。因此,在这项工作中,我们对RMI进行首个依靠发明者独立的广泛分析,目的是了解每个超参数对性能的影响。特别是,我们表明,除了模型类型和层大小外,还必须考虑错误界限和搜索算法,以达到最佳可能的性能。根据我们的调查结果,我们为配置RMI制定简单到遵循的准则。我们通过将由此得出的RMI与一些先进和传统指数进行比较,评估我们的准则。我们表明,我们简单的准则足以与其他学习过的指数和结构已经用昂贵的查点程序确定具有竞争性的性能。此外,我们通过认真改进RMI,同时能够改进SIS的进度。