Large-scale embedding-based retrieval (EBR) is the cornerstone of search-related industrial applications. Given a user query, the system of EBR aims to identify relevant information from a large corpus of documents that may be tens or hundreds of billions in size. The storage and computation turn out to be expensive and inefficient with massive documents and high concurrent queries, making it difficult to further scale up. To tackle the challenge, we propose a binary embedding-based retrieval (BEBR) engine equipped with a recurrent binarization algorithm that enables customized bits per dimension. Specifically, we compress the full-precision query and document embeddings, formulated as float vectors in general, into a composition of multiple binary vectors using a lightweight transformation model with residual multilayer perception (MLP) blocks. We can therefore tailor the number of bits for different applications to trade off accuracy loss and cost savings. Importantly, we enable task-agnostic efficient training of the binarization model using a new embedding-to-embedding strategy. We also exploit the compatible training of binary embeddings so that the BEBR engine can support indexing among multiple embedding versions within a unified system. To further realize efficient search, we propose Symmetric Distance Calculation (SDC) to achieve lower response time than Hamming codes. We successfully employed the introduced BEBR to Tencent products, including Sogou, Tencent Video, QQ World, etc. The binarization algorithm can be seamlessly generalized to various tasks with multiple modalities. Extensive experiments on offline benchmarks and online A/B tests demonstrate the efficiency and effectiveness of our method, significantly saving 30%~50% index costs with almost no loss of accuracy at the system level.
翻译:大规模嵌入式检索( EBR) 是搜索相关产业应用程序的基石 。 用户查询, EBR 系统旨在从大量文件库中识别可能高达数十亿或数千亿亿的全方位文件的相关信息。 存储和计算费用昂贵且效率低, 使用大量文档和高并存查询, 因而难以进一步提升。 为了应对这一挑战, 我们提议使用一个基于二进制嵌入( BEBR) 引擎, 配有一种经常性的双进制比特算法, 使每个维度都能实现自定义的比特化比特。 具体地说, 我们把完整精度查询和文档嵌入的嵌入式压缩, 一般来说, 以浮动的向导方向, 使用轻量化的转换模式, 并用精度的多量化方法组成多个二进制二进制的二进制矢量矢量矢量。 因此, 我们可以用新嵌入的双进制系统, 实现升级的升级的升级到升级的系统 。