State-of-the-art neural (re)rankers are notoriously data-hungry which -- given the lack of large-scale training data in languages other than English -- makes them rarely used in multilingual and cross-lingual retrieval settings. Current approaches therefore commonly transfer rankers trained on English data to other languages and cross-lingual setups by means of multilingual encoders: they fine-tune all parameters of pretrained massively multilingual Transformers (MMTs, e.g., multilingual BERT) on English relevance judgments, and then deploy them in the target language(s). In this work, we show that two parameter-efficient approaches to cross-lingual transfer, namely Sparse Fine-Tuning Masks (SFTMs) and Adapters, allow for a more lightweight and more effective zero-shot transfer to multilingual and cross-lingual retrieval tasks. We first train language adapters (or SFTMs) via Masked Language Modelling and then train retrieval (i.e., reranking) adapters (SFTMs) on top, while keeping all other parameters fixed. At inference, this modular design allows us to compose the ranker by applying the (re)ranking adapter (or SFTM) trained with source language data together with the language adapter (or SFTM) of a target language. We carry out a large scale evaluation on the CLEF-2003 and HC4 benchmarks and additionally, as another contribution, extend the former with queries in three new languages: Kyrgyz, Uyghur and Turkish. The proposed parameter-efficient methods outperform standard zero-shot transfer with full MMT fine-tuning, while being more modular and reducing training times. The gains are particularly pronounced for low-resource languages, where our approaches also substantially outperform the competitive machine translation-based rankers.
翻译:最先进的神经神经神经(再)器是臭名昭著的数据饥饿,由于缺少英语以外语言的大规模培训数据,这些数据很少用于多语种和跨语言的检索环境。因此,目前的做法通常将受过英语数据培训的军士转移到其他语言,并通过多语种编码将跨语言设置转换为:他们微调所有未经培训的大规模多语种变异器(MMMTs,例如多语种BERT)的参数,然后用目标语言部署这些数据。在这项工作中,我们显示两种具有参数效率的跨语言转换方法,即Sprass Ameration-Turning Masks(SFTMs)和调试器,这样可以更轻、更高效地将英语数据转换到其他语言;我们首先通过Macked语言模型培训语言适应器(SFM)的所有参数,然后在上方培训(equoral),再调整(SFTMs),同时保持所有其他参数的固定。据推论,这个模块设计设计,让我们与经过培训的SFAiral Modal 时间进行更精确的SF Areal 。