Embedding based retrieval (EBR) is a fundamental building block in many web applications. However, EBR in sponsored search is distinguished from other generic scenarios and technically challenging due to the need of serving multiple retrieval purposes: firstly, it has to retrieve high-relevance ads, which may exactly serve user's search intent; secondly, it needs to retrieve high-CTR ads so as to maximize the overall user clicks. In this paper, we present a novel representation learning framework Uni-Retriever developed for Bing Search, which unifies two different training modes knowledge distillation and contrastive learning to realize both required objectives. On one hand, the capability of making high-relevance retrieval is established by distilling knowledge from the ``relevance teacher model''. On the other hand, the capability of making high-CTR retrieval is optimized by learning to discriminate user's clicked ads from the entire corpus. The two training modes are jointly performed as a multi-objective learning process, such that the ads of high relevance and CTR can be favored by the generated embeddings. Besides the learning strategy, we also elaborate our solution for EBR serving pipeline built upon the substantially optimized DiskANN, where massive-scale EBR can be performed with competitive time and memory efficiency, and accomplished in high-quality. We make comprehensive offline and online experiments to evaluate the proposed techniques, whose findings may provide useful insights for the future development of EBR systems. Uni-Retriever has been mainstreamed as the major retrieval path in Bing's production thanks to the notable improvements on the representation and EBR serving quality.
翻译:嵌入式检索( EBR) 是许多网络应用程序中一个基本的建筑块。 然而, 受赞助的 EBR 搜索与其他通用情景不同, 技术上也具有挑战性, 原因是需要为多重检索目的服务: 首先, 它必须检索高相关性的广告, 这可能正好为用户的搜索意图服务; 其次, 它需要检索高 CTR 广告, 以便最大限度地扩大用户点击整个程序。 在本文中, 我们为 Bing 搜索开发了一个全新的代表学习框架 Uni- Retever, 它将两种不同的培训模式的知识蒸馏和对比学习统一起来, 以便实现两个要求的目标。 一方面, 通过从 " 提升教师模式 " 中提取知识, 从而建立具有高度相关性的检索能力。 另一方面, 它需要获取高相关性的广告, 高透明度的检索能力 。 我们通过学习的 EBRVA, 也能够优化高透明度的在线解决方案 。