Modern data warehouses can scale compute nodes independently of storage. These systems persist their data on cloud storage, which is always available and cost-efficient. Ad-hoc compute nodes then fetch necessary data on-demand from cloud storage. This ability to quickly scale or shrink data systems is highly beneficial if query workloads may change over time. We apply this new architecture to search engines with a focus on optimizing their latencies in cloud environments. However, simply placing existing search engines (e.g., Apache Lucene) on top of cloud storage significantly increases their end-to-end query latencies (i.e., more than 6 seconds on average in one of our studies). This is because their indexes can incur multiple network round-trips due to their hierarchical structure (e.g., skip lists, B-trees, learned indexes). To address this issue, we develop a new statistical index (called IoU Sketch). For lookup, IoU Sketch makes multiple asynchronous network requests in parallel. While IoU Sketch may fetch more bytes than existing indexes, it significantly reduces the index lookup time because parallel requests do not block each other. Based on IoU Sketch, we build an end-to-end search engine, called Airphant; we describe how Airphant builds, optimizes, and manages IoU Sketch; and ultimately, supports keyword-based querying. In our experiments with four real datasets, Airphant's average end-to-end latencies are between 13 milliseconds and 300 milliseconds, being up to 8.97x faster than Apache Lucence and 113.39x faster than Elasticsearch.
翻译:现代数据仓库可以不受存储范围限制地扩大计算节点。 这些系统持续着其关于云存储的数据, 云存储总是可用且具有成本效益。 自动偏偏计算节点然后根据云存储的需求获取必要数据。 如果查询工作量可能会随时间变化, 快速缩放数据系统的能力非常有益 。 我们应用这个新架构搜索引擎, 重点是优化云环境中的晚期 。 但是, 只要将现有的搜索引擎( 如 Apache Lucene) 放在云存储顶端, 就会大大增加其端到端的查询晚期( 也就是说, 在一项研究中平均超过 6 秒 ) 。 这是因为它们的索引可能会由于它们的等级结构( 如 跳过列表、 B 树、 学习过的索引 ) 快速缩放或缩放数据系统 。 为了解决这个问题, 我们开发了新的统计指数( 称为 IoU Schetch ) 。 关于外观, IoU Schach 将多个不固定的网络请求平行。 IoU Strach 可能比现有的索引要多得多, 它会大大降低指数的外观, 它会减少索引的外观的外观时间, 它会大大减少索引的外观的外观时间,, 因为每个双向方向的SK 。