The Coronavirus (COVID-19) pandemic has led to a rapidly growing 'infodemic' of health information online. This has motivated the need for accurate semantic search and retrieval of reliable COVID-19 information across millions of documents, in multiple languages. To address this challenge, this paper proposes a novel high precision and high recall neural Multistage BiCross encoder approach. It is a sequential three-stage ranking pipeline which uses the Okapi BM25 retrieval algorithm and transformer-based bi-encoder and cross-encoder to effectively rank the documents with respect to the given query. We present experimental results from our participation in the Multilingual Information Access (MLIA) shared task on COVID-19 multilingual semantic search. The independently evaluated MLIA results validate our approach and demonstrate that it outperforms other state-of-the-art approaches according to nearly all evaluation metrics in cases of both monolingual and bilingual runs.
翻译:Corona病毒(COVID-19)大流行导致在线健康信息“信息”迅速增长,因此,需要以多种语言对数百万份文件的可靠COVID-19信息进行准确的语系搜索和检索。为了应对这一挑战,本文件建议采用新的高精度和高回声神经多阶段双晶体编码器方法。这是一个连续的三级输油管,利用Okapi BM25检索算法和基于变压器的双编码器和跨编码器对文件进行有效排序。我们介绍了参加多语言信息存取(MLIA)的实验结果,分享了COVID-19多语言语种语义搜索任务。独立评估的MLIA结果验证了我们的方法,并表明,根据单一语言和双语运行的几乎所有评价指标,它优于其他最先进的方法。