The Coronavirus (COVID-19) pandemic has led to a rapidly growing `infodemic' online. Thus, the accurate retrieval of reliable relevant data from millions of documents about COVID-19 has become urgently needed for the general public as well as for other stakeholders. The COVID-19 Multilingual Information Access (MLIA) initiative is a joint effort to ameliorate exchange of COVID-19 related information by developing applications and services through research and community participation. In this work, we present a search system called Multistage BiCross Encoder, developed by team GATE for the MLIA task 2 Multilingual Semantic Search. Multistage BiCross-Encoder is a sequential three stage pipeline which uses the Okapi BM25 algorithm and a transformer based bi-encoder and cross-encoder to effectively rank the documents with respect to the query. The results of round 1 show that our models achieve state-of-the-art performance for all ranking metrics for both monolingual and bilingual runs.
翻译:Corona病毒(COVID-19)大流行导致在线“信息”迅速增长,因此,公众和其他利益攸关者迫切需要从数百万份COVID-19文件准确检索可靠的相关数据。COVID-19多语言信息存取(MLIA)倡议是一项共同努力,通过研究和社区参与开发应用软件和服务,改善COVID-19相关信息的交流。在这项工作中,我们展示了一个称为多阶段Bicross Encoder的搜索系统,由GATE为MLIA任务2 多语言语义搜索开发。多阶段Bicross-Encoder是一个连续的三级管道,使用Okapi BM25算法和基于双编码的变压器和交叉编码,对文件进行查询的有效排序。第1轮的结果显示,我们的模型在单语和双语运行的所有排名指标上都取得了最先进的业绩。