This paper mainly describes our winning solution (team name: www) to Amazon ESCI Challenge of KDD CUP 2022, which achieves a NDCG score of 0.9043 and wins the first place on task 1: the query-product ranking track. In this competition, participants are provided with a real-world large-scale multilingual shopping queries data set and it contains query-product pairs in English, Japanese and Spanish. Three different tasks are proposed in this competition, including ranking the results list as task 1, classifying the query/product pairs into Exact, Substitute, Complement, or Irrelevant (ESCI) categories as task 2 and identifying substitute products for a given query as task 3. We mainly focus on task 1 and propose a semantic alignment system for multilingual query-product retrieval. Pre-trained multilingual language models (LM) are adopted to get the semantic representation of queries and products. Our models are all trained with cross-entropy loss to classify the query-product pairs into ESCI 4 categories at first, and then we use weighted sum with the 4-class probabilities to get the score for ranking. To further boost the model, we also do elaborative data preprocessing, data augmentation by translation, specially handling English texts with English LMs, adversarial training with AWP and FGM, self distillation, pseudo labeling, label smoothing and ensemble. Finally, Our solution outperforms others both on public and private leaderboard.
翻译:本文主要介绍我们为亚马逊ESCI挑战(KDD CUP 2022)的亚马逊 ESCI 挑战赢得的解决方案(团队名称:www),该选项达到0.9043分,并在任务1(查询产品排名轨道)中赢得第一位;在这一竞争中,为参与者提供了一个真实的、大规模多语种购物询问数据集,其中包含英文、日文和西班牙文的查询产品配对。在这一竞争中,提出了三项不同的任务,包括将结果列表列为任务1,将查询/产品配对分为Exact、代用品、补充或Ircontel (ESCI) 类,将查询/产品配对分为任务2,为特定查询确定替代产品作为任务3。我们主要侧重于任务1,并为多语种查询产品检索建议一个语义协调系统。通过预先培训的多语种语言模式(LM)获得查询和产品的语义描述。我们的模式都经过交叉作物损失培训,以便首先将查询产品配对划为ESCI 4类,然后我们使用四等相可比较的可比较性(ESCI),确定一个查询性产品配对值,然后确定某个查询性产品的替代产品的替代产品替代产品替代产品替代产品替代产品替代产品替代产品作为任务。我们主要重点,然后将排序排序排序,然后用英语升级为英语、升级升级、升级、升级、升级、升级、升级、升级、升级为我们的数据模型,最后版本、升级版、升级版、升级、升级、升级、升级、升级、升级、升级、升级、升级、升级、升级、升级、升级、升级、升级、升级、升级、升级、升级、升级、升级、升级、升级、升级、升级、升级、升级、升级、升级。