In modern e-commerce search systems, dense retrieval has become an indispensable component. By computing similarities between query and item (product) embeddings, it efficiently selects candidate products from large-scale repositories. With the breakthroughs in large language models (LLMs), mainstream embedding models have gradually shifted from BERT to LLMs for more accurate text modeling. However, these models still adopt direct-embedding methods, and the semantic accuracy of embeddings remains inadequate. Therefore, contrastive learning is heavily employed to achieve tight semantic alignment between positive pairs. Consequently, such models tend to capture statistical co-occurrence patterns in the training data, biasing them toward shallow lexical and semantic matches. For difficult queries exhibiting notable lexical disparity from target items, the performance degrades significantly. In this work, we propose the Large Reasoning Embedding Model (LREM), which novelly integrates reasoning processes into representation learning. For difficult queries, LREM first conducts reasoning to achieve a deep understanding of the original query, and then produces a reasoning-augmented query embedding for retrieval. This reasoning process effectively bridges the semantic gap between original queries and target items, significantly improving retrieval accuracy. Specifically, we adopt a two-stage training process: the first stage optimizes the LLM on carefully curated Query-CoT-Item triplets with SFT and InfoNCE losses to establish preliminary reasoning and embedding capabilities, and the second stage further refines the reasoning trajectories via reinforcement learning (RL). Extensive offline and online experiments validate the effectiveness of LREM, leading to its deployment on China's largest e-commerce platform since August 2025.
翻译:在现代电子商务搜索系统中,稠密检索已成为不可或缺的组成部分。通过计算查询与商品(产品)嵌入之间的相似度,它能够高效地从大规模商品库中筛选候选产品。随着大语言模型(LLM)取得突破,主流的嵌入模型已逐渐从BERT转向LLM,以实现更精确的文本建模。然而,这些模型仍采用直接嵌入方法,其嵌入的语义准确性依然不足。因此,现有方法大量采用对比学习以实现正样本对之间的紧密语义对齐。这导致此类模型倾向于捕捉训练数据中的统计共现模式,使其偏向于浅层的词汇与语义匹配。对于与目标商品存在显著词汇差异的困难查询,其性能会大幅下降。在本研究中,我们提出了大型推理嵌入模型(LREM),该模型创新地将推理过程整合到表示学习中。针对困难查询,LREM首先进行推理以深入理解原始查询,随后生成经过推理增强的查询嵌入用于检索。这一推理过程有效弥合了原始查询与目标商品之间的语义鸿沟,显著提升了检索准确性。具体而言,我们采用两阶段训练流程:第一阶段通过精心构建的Query-CoT-Item三元组,结合SFT与InfoNCE损失对LLM进行优化,以建立初步的推理与嵌入能力;第二阶段则通过强化学习(RL)进一步优化推理轨迹。大量的离线与在线实验验证了LREM的有效性,该模型已于2025年8月起在中国最大的电子商务平台上线部署。