Embedding-based retrieval (EBR) is a technique to use embeddings to represent query and document, and then convert the retrieval problem into a nearest neighbor search problem in the embedding space. Some previous works have mainly focused on representing the web page with a single embedding, but in real web search scenarios, it is difficult to represent all the information of a long and complex structured web page as a single embedding. To address this issue, we design a click feedback-aware web page summarization for multi-embedding-based retrieval (CPS-MEBR) framework which is able to generate multiple embeddings for web pages to match different potential queries. Specifically, we use the click data of users in search logs to train a summary model to extract those sentences in web pages that are frequently clicked by users, which are more likely to answer those potential queries. Meanwhile, we introduce sentence-level semantic interaction to design a multi-embedding-based retrieval (MEBR) model, which can generate multiple embeddings to deal with different potential queries by using frequently clicked sentences in web pages. Offline experiments show that it can perform high quality candidate retrieval compared to single-embedding-based retrieval (SEBR) model.
翻译:嵌入式的基于嵌入式的检索( EBR) 是使用嵌入式嵌入式的网页缩入来代表查询和文件的一种技术, 然后将检索问题转换成嵌入空间中近邻的搜索问题。 先前的一些工作主要侧重于用单个嵌入式代表网页, 但在真正的网络搜索情景中, 很难将长而复杂的结构化网页的所有信息都作为单一嵌入式代表。 为了解决这个问题, 我们设计了一个多嵌入式检索( CPS- MEBR) 框架, 它能够生成多个嵌入式的网页嵌入器, 以匹配不同的潜在查询。 具体地说, 我们使用用户在搜索日志中的点击数据来培训摘要模型, 以在经常被用户点击的网页中提取这些句子, 而用户更可能回答这些潜在查询。 同时, 我们引入了句级的语系互动, 设计一个多嵌入式的基于检索( MEBR) 模式, 它可以生成多个嵌入式的嵌入式, 以便通过经常点击网页中的句子处理不同的潜在查询。 离线实验显示它可以进行高质量的单个检索( SER) 比较的单个检索。