Query Auto-Completion (QAC) is an ubiquitous feature of modern textual search systems, suggesting possible ways of completing the query being typed by the user. Efficiency is crucial to make the system have a real-time responsiveness when operating in the million-scale search space. Prior work has extensively advocated the use of a trie data structure for fast prefix-search operations in compact space. However, searching by prefix has little discovery power in that only completions that are prefixed by the query are returned. This may impact negatively the effectiveness of the QAC system, with a consequent monetary loss for real applications like Web Search Engines and eCommerce. In this work we describe the implementation that empowers a new QAC system at eBay, and discuss its efficiency/effectiveness in relation to other approaches at the state-of-the-art. The solution is based on the combination of an inverted index with succinct data structures, a much less explored direction in the literature. This system is replacing the previous implementation based on Apache SOLR that was not always able to meet the required service-level-agreement.
翻译:查询自动完整(QAC)是现代文本搜索系统无处不在的特征,它建议了完成用户输入的查询的可能方法。效率对于使该系统在百万规模的搜索空间运行时具有实时反应能力至关重要。先前的工作已广泛提倡使用三角数据结构进行紧凑空间的快速前置搜索作业。然而,通过前缀搜索几乎没有什么发现力,因为只有查询前缀的完成才能返回。这可能对QAC系统的有效性产生不利影响,从而对网络搜索引擎和电子商务等实际应用造成金钱损失。在此工作中,我们描述了在eBay赋予新的QAC系统权能的实施工作,并讨论了其与最新工艺中其他方法的效率和效力。解决方案的基础是将倒置指数与简洁的数据结构相结合,文献中探索的方向要少得多。该系统正在取代以前基于阿帕切·索拉里(Ache SOLR)的实施工作,而该系统一直无法满足所需的服务级协议。