Dense retrieval has shown great success in passage ranking in English. However, its effectiveness in document retrieval for non-English languages remains unexplored due to the limitation in training resources. In this work, we explore different transfer techniques for document ranking from English annotations to multiple non-English languages. Our experiments on the test collections in six languages (Chinese, Arabic, French, Hindi, Bengali, Spanish) from diverse language families reveal that zero-shot model-based transfer using mBERT improves the search quality in non-English mono-lingual retrieval. Also, we find that weakly-supervised target language transfer yields competitive performances against the generation-based target language transfer that requires external translators and query generators.
翻译:大量检索在英文的读取排名中表现出了巨大的成功,然而,由于培训资源有限,非英语文件检索的实效仍未得到探讨。在这项工作中,我们探索了从英文注释到多种非英语语言文件排序的不同转移技术。我们在六种语言(中文、阿拉伯文、法文、印地语、孟加拉语、西班牙语)的测试集实验显示,使用MBERT的零光模型传输提高了非英语单语检索的搜索质量。此外,我们发现,低监督的目标语言传输与需要外部笔译员和问询器的代代代目标语言传输相比,具有竞争性的表现。