Multilingual e-commerce search suffers from severe data imbalance across languages, label noise, and limited supervision for low-resource languages--challenges that impede the cross-lingual generalization of relevance models despite the strong capabilities of large language models (LLMs). In this work, we present a practical, architecture-agnostic, data-centric framework to enhance performance on two core tasks: Query-Category (QC) relevance (matching queries to product categories) and Query-Item (QI) relevance (matching queries to product titles). Rather than altering the model, we redesign the training data through three complementary strategies: (1) translation-based augmentation to synthesize examples for languages absent in training, (2) semantic negative sampling to generate hard negatives and mitigate class imbalance, and (3) self-validation filtering to detect and remove likely mislabeled instances. Evaluated on the CIKM AnalytiCup 2025 dataset, our approach consistently yields substantial F1 score improvements over strong LLM baselines, achieving competitive results in the official competition. Our findings demonstrate that systematic data engineering can be as impactful as--and often more deployable than--complex model modifications, offering actionable guidance for building robust multilingual search systems in the real-world e-commerce settings.
翻译:多语言电商搜索面临跨语言数据严重不平衡、标签噪声以及低资源语言监督有限等挑战,这些问题阻碍了相关性模型的跨语言泛化能力,尽管大语言模型(LLM)具备强大性能。本文提出一种实用、架构无关、以数据为中心的框架,旨在提升两个核心任务的表现:查询-类别(QC)相关性(将查询匹配至产品类别)和查询-商品(QI)相关性(将查询匹配至产品标题)。我们并未修改模型,而是通过三种互补策略重新设计训练数据:(1)基于翻译的增强,为训练集中缺失的语言合成样本;(2)语义负采样,生成困难负例以缓解类别不平衡;(3)自验证过滤,检测并移除可能误标的实例。在CIKM AnalytiCup 2025数据集上的评估表明,相较于强大的LLM基线,我们的方法持续带来显著的F1分数提升,并在官方竞赛中取得了具有竞争力的结果。我们的研究证明,系统的数据工程能够产生与复杂模型修改相当——且通常更易于部署——的影响,为在实际电商场景中构建稳健的多语言搜索系统提供了可操作的指导。