STARS：面向大规模推荐系统的语义标记与增强表示 (STARS: Semantic Tokens with Augmented Representations for Recommendation at Scale)

Real-world ecommerce recommender systems must deliver relevant items under strict tens-of-milliseconds latency constraints despite challenges such as cold-start products, rapidly shifting user intent, and dynamic context including seasonality, holidays, and promotions. We introduce STARS, a transformer-based sequential recommendation framework built for large-scale, low-latency ecommerce settings. STARS combines several innovations: dual-memory user embeddings that separate long-term preferences from short-term session intent; semantic item tokens that fuse pretrained text embeddings, learnable deltas, and LLM-derived attribute tags, strengthening content-based matching, long-tail coverage, and cold-start performance; context-aware scoring with learned calendar and event offsets; and a latency-conscious two-stage retrieval pipeline that performs offline embedding generation and online maximum inner-product search with filtering, enabling tens-of-milliseconds response times. In offline evaluations on production-scale data, STARS improves Hit@5 by more than 75 percent relative to our existing LambdaMART system. A large-scale A/B test on 6 million visits shows statistically significant lifts, including Total Orders +0.8%, Add-to-Cart on Home +2.0%, and Visits per User +0.5%. These results demonstrate that combining semantic enrichment, multi-intent modeling, and deployment-oriented design can yield state-of-the-art recommendation quality in real-world environments without sacrificing serving efficiency.

翻译：现实世界中的电商推荐系统必须在严格的数十毫秒延迟约束下提供相关商品，同时应对冷启动商品、快速变化的用户意图以及动态上下文（如季节性、节假日和促销活动）等挑战。本文提出STARS，一种专为大规模、低延迟电商场景设计的基于Transformer的顺序推荐框架。STARS融合了多项创新：双记忆用户嵌入，将长期偏好与短期会话意图分离；语义商品标记，融合预训练文本嵌入、可学习的增量调整以及基于大语言模型生成的属性标签，从而增强基于内容的匹配、长尾覆盖和冷启动性能；结合学习型日历与事件偏移的上下文感知评分；以及注重延迟的两阶段检索流水线，实现离线嵌入生成和在线带过滤的最大内积搜索，确保数十毫秒的响应时间。在基于生产级数据的离线评估中，STARS相较于现有LambdaMART系统，将Hit@5指标提升了超过75%。一项覆盖600万次访问的大规模A/B测试显示出统计显著的提升，包括总订单量+0.8%、首页加购率+2.0%和人均访问次数+0.5%。这些结果表明，结合语义增强、多意图建模和面向部署的设计，可以在不牺牲服务效率的前提下，在真实环境中实现最先进的推荐质量。