Inverted indexes continue to be a mainstay of text search engines, allowing efficient querying of large document collections. While there are a number of possible organizations, document-ordered indexes are the most common, since they are amenable to various query types, support index updates, and allow for efficient dynamic pruning operations. One disadvantage with document-ordered indexes is that high-scoring documents can be distributed across the document identifier space, meaning that index traversal algorithms that terminate early might put search effectiveness at risk. The alternative is impact-ordered indexes, which primarily support top-k disjunctions, but also allow for anytime query processing, where the search can be terminated at any time, with search quality improving as processing latency increases. Anytime query processing can be used to effectively reduce high-percentile tail latency which is essential for operational scenarios in which a service level agreement (SLA) imposes response time requirements. In this work, we show how document-ordered indexes can be organized such that they can be queried in an anytime fashion, enabling strict latency control with effective early termination. Our experiments show that processing document-ordered topical segments selected by a simple score estimator outperforms existing anytime algorithms, and allows query runtimes to be accurately limited in order to comply with SLA requirements.
翻译:反向索引继续是文本搜索引擎的支柱,从而可以有效地查询大型文件收藏。虽然有一些可能的组织,但文件排序索引是最常见的,因为它们适合不同的查询类型,支持索引更新,并允许高效动态的剪裁操作。文件排序索引的一个缺点是,高分数文档可以在文件标识空间中分布,这意味着早期终止的索引曲线算法可能会危及搜索效力。另一个办法是影响排序索引,主要支持顶级脱节,但也允许随时查询处理,可以随时终止搜索,随着处理延时的增加,搜索质量可以随着搜索质量的提高而得到改善。任何时间查询处理都可以有效地减少高中度尾部宽度,这对于服务级别协议(SLA)规定响应时间要求的业务情景至关重要。在这项工作中,我们展示了文件排序指数如何组织起来,以便随时可以查询,从而能够严格控制顶级脱钩,同时允许随时进行查询,随着处理文件定序定值的当前定值序列可以精确地通过简单的定分数系统进行。