Retrieving relevant items that match users' queries from billion-scale corpus forms the core of industrial e-commerce search systems, in which embedding-based retrieval (EBR) methods are prevailing. These methods adopt a two-tower framework to learn embedding vectors for query and item separately and thus leverage efficient approximate nearest neighbor (ANN) search to retrieve relevant items. However, existing EBR methods usually ignore inconsistent user behaviors in industrial multi-stage search systems, resulting in insufficient retrieval efficiency with a low commercial return. To tackle this challenge, we propose to improve EBR methods by learning Multi-level Multi-Grained Semantic Embeddings(MMSE). We propose the multi-stage information mining to exploit the ordered, clicked, unclicked and random sampled items in practical user behavior data, and then capture query-item similarity via a post-fusion strategy. We then propose multi-grained learning objectives that integrate the retrieval loss with global comparison ability and the ranking loss with local comparison ability to generate semantic embeddings. Both experiments on a real-world billion-scale dataset and online A/B tests verify the effectiveness of MMSE in achieving significant performance improvements on metrics such as offline recall and online conversion rate (CVR).
翻译:电子商务搜索系统核心是从十亿级语料库中检索与用户查询匹配的相关商品。基于嵌入式检索(EBR)方法已成为主流,该方法采用两塔框架分别学习查询和商品的嵌入向量,并利用有效的近似最近邻(ANN)搜索来检索相关商品。然而,现有的EBR方法通常忽略了工业多阶段搜索系统中的不一致的用户行为,导致检索效率不足,商业回报率低。为了解决这一问题,我们提出了一种改进EBR方法的方法,即通过学习多级多粒度语义嵌入(MMSE)。我们提出了多阶段信息挖掘,以利用实际用户行为数据中的有序、点击、未点击和随机抽样的商品,然后通过后融合策略捕获查询项相似度。然后,我们提出了多粒度学习目标,将检索损失与全局比较能力和排名损失与局部比较能力相结合,生成语义嵌入。真实亿级数据集的实验和在线A / B测试都验证了MMSE在离线回调率和在线转化率(CVR)等指标上显著提高性能的有效性。