In this paper, we propose a robust multilingual model to improve the quality of search results. Our model not only leverage the processed class-balanced dataset, but also benefit from multitask pre-training that leads to more general representations. In pre-training stage, we adopt mlm task, classification task and contrastive learning task to achieve considerably performance. In fine-tuning stage, we use confident learning, exponential moving average method (EMA), adversarial training (FGM) and regularized dropout strategy (R-Drop) to improve the model's generalization and robustness. Moreover, we use a multi-granular semantic unit to discover the queries and products textual metadata for enhancing the representation of the model. Our approach obtained competitive results and ranked top-8 in three tasks. We release the source code and pre-trained models associated with this work.
翻译:在本文中,我们提出了一个强大的多语种模型来提高搜索结果的质量。我们的模型不仅利用了经过处理的班级平衡数据集,而且还受益于导致更普遍代表性的多任务预培训。在培训前阶段,我们采取了 mlm 任务、分类任务和对比式学习任务,以取得显著的成绩。在微调阶段,我们使用自信学习、指数移动平均方法、对抗性培训和正规化的辍学战略(R-Drop)来改进模型的概括性和稳健性。此外,我们使用多语义单位来发现查询和产品文本元数据,以加强模型的代表性。我们的方法取得了竞争性成果,在三项任务中排在前8位。我们发布了与这项工作相关的源代码和预先培训模式。