Spatial objects often come with textual information, such as Points of Interest (POIs) with their descriptions, which are referred to as geo-textual data. To retrieve such data, spatial keyword queries that take into account both spatial proximity and textual relevance have been extensively studied. Existing indexes designed for spatial keyword queries are mostly built based on the geo-textual data without considering the distribution of queries already received. However, previous studies have shown that utilizing the known query distribution can improve the index structure for future query processing. In this paper, we propose WISK, a learned index for spatial keyword queries, which self-adapts for optimizing querying costs given a query workload. One key challenge is how to utilize both structured spatial attributes and unstructured textual information during learning the index. We first divide the data objects into partitions, aiming to minimize the processing costs of the given query workload. We prove the NP-hardness of the partitioning problem and propose a machine learning model to find the optimal partitions. Then, to achieve more pruning power, we build a hierarchical structure based on the generated partitions in a bottom-up manner with a reinforcement learning-based approach. We conduct extensive experiments on real-world datasets and query workloads with various distributions, and the results show that WISK outperforms all competitors, achieving up to 8x speedup in querying time with comparable storage overhead.
翻译:用于空间关键字查询的现有索引大多建立在地理文字数据的基础上,而没有考虑已经收到的查询的分布情况。然而,先前的研究显示,利用已知的查询分布可以改进未来查询处理的索引结构。在本文件中,我们提议了WISK,即空间关键字查询的学习指数,根据查询工作量,在优化查询成本方面自适应。一个关键字查询的自适应。一个关键字查询,是在学习索引期间如何利用结构空间空间属性和非结构文本信息的关键词查询。我们首先将数据对象分成分隔开来,目的是尽量降低所收到查询工作量的处理成本。我们证明,使用已知的查询分布可以改进未来查询处理的索引结构。我们提出一个机器学习模型,以找到最佳分区。然后,为了获得更多的调整能力,我们以自下而上的方式在生成的分区上建立起一个分级结构,以自下而上的方式优化查询工作量。我们以强化的进度为基础,在学习速度上进行广泛的数据实验,用真实的存储速度和基于可比较的存储方式,用真实的存储方式,以真实的存储方式,展示各种数据形式,以真实的存储速度进行。</s>