从数据学习到快速排序表格搜索程序:方法和实用准则 (Learning from Data to Speed-up Sorted Table Search Procedures: Methodology and Practical Guidelines)

Sorted Table Search Procedures are the quintessential query-answering tool, with widespread usage that now includes also Web Applications, e.g, Search Engines (Google Chrome) and ad Bidding Systems (AppNexus). Speeding them up, at very little cost in space, is still a quite significant achievement. Here we study to what extend Machine Learning Techniques can contribute to obtain such a speed-up via a systematic experimental comparison of known efficient implementations of Sorted Table Search procedures, with different Data Layouts, and their Learned counterparts developed here. We characterize the scenarios in which those latter can be profitably used with respect to the former, accounting for both CPU and GPU computing. Our approach contributes also to the study of Learned Data Structures, a recent proposal to improve the time/space performance of fundamental Data Structures, e.g., B-trees, Hash Tables, Bloom Filters. Indeed, we also formalize an Algorithmic Paradigm of Learned Dichotomic Sorted Table Search procedures that naturally complements the Learned one proposed here and that characterizes most of the known Sorted Table Search Procedures as having a "learning phase" that approximates Simple Linear Regression.

翻译：分类的表格搜索程序是典型的问答工具,其广泛用途现在也包括网络应用程序,例如搜索引擎(Google Chrome)和AppNexus。以很小的空间成本加速搜索引擎(Google Chrome)和AppNexus系统(AppNexus)仍然是相当可观的成就。这里我们研究的是,通过系统实验比较已知的分类表格搜索程序的有效实施效率,与不同的数据布局以及在此开发的对等技术,如何扩大机器学习技术,从而加快这种速度。我们描述的是,在哪些情况下,前者可以盈利地用于前者,计算CPU和GPU计算。我们的方法还有助于对数据结构的研究,这是最近提出的改进基本数据结构的时间/空间性能的建议,例如,B-树、Hash表、Bloom过滤器。事实上,我们还正式确定了一个对学习分组式表格搜索程序的算法,它自然地补充了此处提议的脱序程序,并且将大多数已知的缩略图程序定性为“简单搜索程序”的缩图阶段。