Substantial improvements have been made in machine reading comprehension, where the machine answers questions based on a given context. Current state-of-the-art models even surpass human performance on several benchmarks. However, their abilities in the cross-lingual scenario are still to be explored. Previous works have revealed the abilities of pre-trained multilingual models for zero-shot cross-lingual reading comprehension. In this paper, we further utilized unlabeled data to improve the performance. The model is first supervised-trained on source language corpus, and then self-trained with unlabeled target language data. The experiment results showed improvements for all languages, and we also analyzed how self-training benefits cross-lingual reading comprehension in qualitative aspects.
翻译:在机读理解方面,机器根据特定背景回答问题,在机读理解方面有了重大改进。目前最先进的模型甚至在若干基准上超过了人类的绩效。然而,它们跨语言情景中的能力仍有待探索。以前的工作揭示了预先训练的多语文模式对零点跨语言阅读理解的能力。在本文中,我们进一步利用未标的数据来改进业绩。该模型首先在源语言保护软件方面接受过监督培训,然后用未标目标语言数据进行自我培训。实验结果显示所有语言都有改进,我们还分析了自我培训如何在质量方面有利于跨语言阅读理解。