Short-video platforms have rapidly become a new generation of information retrieval systems, where users formulate queries to access desired videos. However, user queries, especially long-tail ones, often suffer from spelling errors, incomplete phrasing, and ambiguous intent, resulting in mismatches between user expectations and retrieved results. While large language models (LLMs) have shown success in long-tail query rewriting within e-commerce, they struggle on short-video platforms, where proprietary content such as short videos, live streams, micro dramas, and user social networks falls outside their training distribution. To address this challenge, we introduce \textbf{CardRewriter}, an LLM-based framework that incorporates domain-specific knowledge to enhance long-tail query rewriting. For each query, our method aggregates multi-source knowledge relevant to the query and summarizes it into an informative and query-relevant knowledge card. This card then guides the LLM to better capture user intent and produce more effective query rewrites. We optimize CardRewriter using a two-stage training pipeline: supervised fine-tuning followed by group relative policy optimization, with a tailored reward system balancing query relevance and retrieval effectiveness. Offline experiments show that CardRewriter substantially improves rewriting quality for queries targeting proprietary content. Online A/B testing further confirms significant gains in long-view rate (LVR) and click-through rate (CTR), along with a notable reduction in initiative query reformulation rate (IQRR). Since September 2025, CardRewriter has been deployed on Kuaishou, one of China's largest short-video platforms, serving hundreds of millions of users daily.
翻译:短视频平台已迅速成为新一代信息检索系统,用户通过输入查询来获取所需视频。然而,用户查询,尤其是长尾查询,常常存在拼写错误、表述不完整和意图模糊等问题,导致用户期望与检索结果不匹配。尽管大语言模型(LLM)在电子商务领域的长尾查询改写中取得了成功,但在短视频平台上却面临挑战,因为短视频、直播、微短剧及用户社交网络等专有内容超出了其训练数据分布。为应对这一挑战,我们提出了 \textbf{CardRewriter},一个基于大语言模型的框架,通过融入领域特定知识来增强长尾查询改写。对于每个查询,我们的方法会聚合与该查询相关的多源知识,并将其总结为一张信息丰富且与查询相关的知识卡片。随后,该卡片引导大语言模型更准确地捕捉用户意图,并生成更有效的查询改写。我们采用两阶段训练流程优化 CardRewriter:首先进行监督微调,随后进行组相对策略优化,并辅以平衡查询相关性与检索效果的定制奖励机制。离线实验表明,CardRewriter 显著提升了针对专有内容查询的改写质量。在线 A/B 测试进一步证实,其在长播率(LVR)和点击率(CTR)方面取得了显著提升,同时主动查询重构率(IQRR)也明显下降。自 2025 年 9 月起,CardRewriter 已在中国最大的短视频平台之一快手部署,每日为数亿用户提供服务。