小型模拟空间中已排序的表格搜索和静态索引 (Learned Sorted Table Search and Static Indexes in Small Model Space)

Machine Learning Techniques, properly combined with Data Structures, have resulted in Learned Static Indexes, innovative and powerful tools that speed-up Binary Search, with the use of additional space with respect to the table being searched into. Such space is devoted to the Machine Learning Model. Although in their infancy, they are methodologically and practically important, due to the pervasiveness of Sorted Table Search procedures. In modern applications, model space is a key factor and, in fact, a major open question concerning this area is to assess to what extent one can enjoy the speed-up of Binary Search achieved by Learned Indexes while using constant or nearly constant space models. In this paper, we investigate the mentioned question by (a) introducing two new models, i.e., the Learned k-ary Search Model and the Synoptic Recursive Model Index, respectively; (b) systematically exploring the time-space trade-offs of a hierarchy of existing models, i.e., the ones in the reference software platform Searching on Sorted Data, together with the new ones proposed here. By adhering and extending the current benchmarking methodology, we experimentally show that the Learned k-ary Search Model can speed up Binary Search in constant additional space. Our second model, together with the bi-criteria Piece-wise Geometric Model index, can achieve a speed-up of Binary Search with a model space of 0:05% more than the one taken by the table, being competitive in terms of time-space trade-off with existing proposals. The Synoptic Recursive Model Index and the bi-criteria Piece-wise Geometric Model complement each other quite well across the various levels of the internal memory hierarchy. Finally, our findings stimulate research in this area, since they highlight the need for further studies regarding the time-space relation in Learned Indexes.

翻译：与数据结构适当结合的机器学习技术,已经产生了学习静态指数、创新和强大的工具,加速了二进制搜索,在搜索的表格上使用了额外的空间。这种空间用于机器学习模型。尽管在初始阶段,由于分类表格搜索程序的普及性,它们具有方法和实际重要性。在现代应用中,模型空间是一个关键因素,事实上,该领域的一个主要未决问题是评估在多大程度上能够享受到在使用恒定或近乎恒定空间模型的同时,通过学习指数实现的二进制搜索速度。在本文件中,我们通过以下两种方法来调查上述问题:(a) 引入两种新模型,即,即学习 k-ary搜索模型和综合光学透析模型指数指数,尽管在它们初选表格搜索程序上,在参考软件平台上,搜索现有数据,以及在这里提出的新数据,通过遵守和扩展当前的基准方法,我们实验性地平时标模型,通过不断的模型,在每进一步的模型中,可以实现我们不断读的进度,在每进一步的模型中,在每进一步的模型中,通过不断的进度研究,在每进一步的模型中,可以进一步的进度中,在搜索中,在每进一步的模型中可以实现。