Learned Indexes are a novel approach to search in a sorted table. A model is used to predict an interval in which to search into and a Binary Search routine is used to finalize the search. They are quite effective. For the final stage, usually, the lower_bound routine of the Standard C++ library is used, although this is more of a natural choice rather than a requirement. However, recent studies, that do not use Machine Learning predictions, indicate that other implementations of Binary Search or variants, namely k-ary Search, are better suited to take advantage of the features offered by modern computer architectures. With the use of the Searching on Sorted Sets SOSD Learned Indexing benchmarking software, we investigate how to choose a Search routine for the final stage of searching in a Learned Index. Our results provide indications that better choices than the lower_bound routine can be made. We also highlight how such a choice may be dependent on the computer architecture that is to be used. Overall, our findings provide new and much-needed guidelines for the selection of the Search routine within the Learned Indexing framework.
翻译:学习索引是一种在分类表格中搜索的新办法。 模型用来预测搜索间隔, 并使用二进位搜索程序来完成搜索。 它们相当有效 。 在最后阶段, 通常使用标准 C++ 库的较低限制的例行程序, 虽然这更多地是一种自然选择, 而不是要求。 但是, 最近的研究, 不使用机器学习预测, 显示其他二进搜索或变体, 即 k-ary 搜索, 更适合利用现代计算机架构提供的功能。 使用 搜索 排序的设置 SOSD SOSD Science 索引基准软件, 我们调查如何选择一个搜索程序, 用于在学习索引中搜索的最后阶段。 我们的结果显示, 选择比较低限制的例行程序更好。 我们还强调, 这种选择可能取决于将要使用的计算机架构。 总体而言, 我们的研究结果为在确定索引框架内选择搜索程序提供了新的和非常需要的指导方针 。