Rapid response, namely low latency, is fundamental in search applications; it is particularly so in interactive search sessions, such as those encountered in conversational settings. An observation with a potential to reduce latency asserts that conversational queries exhibit a temporal locality in the lists of documents retrieved. Motivated by this observation, we propose and evaluate a client-side document embedding cache, improving the responsiveness of conversational search systems. By leveraging state-of-the-art dense retrieval models to abstract document and query semantics, we cache the embeddings of documents retrieved for a topic introduced in the conversation, as they are likely relevant to successive queries. Our document embedding cache implements an efficient metric index, answering nearest-neighbor similarity queries by estimating the approximate result sets returned. We demonstrate the efficiency achieved using our cache via reproducible experiments based on TREC CAsT datasets, achieving a hit rate of up to 75% without degrading answer quality. Our achieved high cache hit rates significantly improve the responsiveness of conversational systems while likewise reducing the number of queries managed on the search back-end.
翻译:快速响应,即低潜伏,在搜索应用程序中至关重要;在互动搜索会话中尤其如此,例如在对话环境中遇到的搜索会话中尤其如此; 有可能减少潜伏的观察显示,对话询问在检索的文件清单中显示一个时间位置。 受此观察的驱动,我们提议和评估一个客户端文件嵌入缓存,提高对谈话搜索系统的响应能力。 通过利用最先进的密集检索模型来抽象文档和查询语义,我们隐藏为对话中引入的话题检索到的文件的嵌入,因为这些文件可能与连续查询相关。 我们的嵌入缓存文件采用了高效的衡量索引,通过估计返回的近邻相似性询问。 我们通过基于TRECCAST数据集的可复制实验,展示了我们通过缓存实现的效率,在不降低回答质量的情况下达到高达75%的点击率。 我们达到的高缓存点击率,大大提高了谈话系统的响应率,同时减少了搜索后端的查询次数。