In this paper, we introduce the approach behind our submission for the MIRACL challenge, a WSDM 2023 Cup competition that centers on ad-hoc retrieval across 18 diverse languages. Our solution contains two neural-based models. The first model is a bi-encoder re-ranker, on which we apply a cross-lingual distillation technique to transfer ranking knowledge from English to the target language space. The second model is a cross-encoder re-ranker trained on multilingual retrieval data generated using neural machine translation. We further fine-tune both models using MIRACL training data and ensemble multiple rank lists to obtain the final result. According to the MIRACL leaderboard, our approach ranks 8th for the Test-A set and 2nd for the Test-B set among the 16 known languages.
翻译:在本文中,我们介绍了提交MIRACL挑战书背后的方法,这是一场WSDM 2023杯竞赛,以18种不同语言的快速检索为中心。我们的解决办法包括两个神经模型。第一个模型是双读码器重新排序,我们在此上应用一种跨语言的蒸馏技术,将知识从英语向目标语言空间的排序。第二个模型是受过通过神经机器翻译生成的多语言检索数据培训的跨读码器再排序器。我们进一步微调两种模型,使用MIRADML培训数据和混合多级列表获得最终结果。根据MIRACLL领导板,我们在16种已知语言中的测试-A组合和测试-B组合中采用第八级和第二级的测试-B组合。</s>