Extractive Reading Comprehension (ERC) has made tremendous advances enabled by the availability of large-scale high-quality ERC training data. Despite of such rapid progress and widespread application, the datasets in languages other than high-resource languages such as English remain scarce. To address this issue, we propose a Cross-Lingual Transposition ReThinking (XLTT) model by modelling existing high-quality extractive reading comprehension datasets in a multilingual environment. To be specific, we present multilingual adaptive attention (MAA) to combine intra-attention and inter-attention to learn more general generalizable semantic and lexical knowledge from each pair of language families. Furthermore, to make full use of existing datasets, we adopt a new training framework to train our model by calculating task-level similarities between each existing dataset and target dataset. The experimental results show that our XLTT model surpasses six baselines on two multilingual ERC benchmarks, especially more effective for low-resource languages with 3.9 and 4.1 average improvement in F1 and EM, respectively.
翻译:大量高质量的ESC培训数据的提供,使抽取阅读综合(ERC)取得了巨大进展。尽管取得了如此迅速的进展和广泛应用,但英文等高资源语言以外的语文数据集仍然稀缺。为解决这一问题,我们提议通过模拟多语种环境中现有的高质量的抽取阅读理解数据集,建立一个跨语言转换重新定位模型(XLTT),在多语种环境中建模现有高质量的抽取阅读理解数据集。具体地说,我们提出了多语言适应性关注(MAA),以结合不同语言家庭内部和相互关注,学习更普遍的通用语义和词汇学知识。此外,为了充分利用现有的数据集,我们采用了新的培训框架,通过计算每个现有数据集和目标数据集之间的任务级别相似性来培训我们的模型。实验结果表明,我们的XLTT模型在两种多语言EC基准上超过了六个基线,特别是对于低资源语言而言更为有效,F1和EM分别有3.9和4.1的平均改进。