We target the task of cross-lingual Machine Reading Comprehension (MRC) in the direct zero-shot setting, by incorporating syntactic features from Universal Dependencies (UD), and the key features we use are the syntactic relations within each sentence. While previous work has demonstrated effective syntax-guided MRC models, we propose to adopt the inter-sentence syntactic relations, in addition to the rudimentary intra-sentence relations, to further utilize the syntactic dependencies in the multi-sentence input of the MRC task. In our approach, we build the Inter-Sentence Dependency Graph (ISDG) connecting dependency trees to form global syntactic relations across sentences. We then propose the ISDG encoder that encodes the global dependency graph, addressing the inter-sentence relations via both one-hop and multi-hop dependency paths explicitly. Experiments on three multilingual MRC datasets (XQuAD, MLQA, TyDiQA-GoldP) show that our encoder that is only trained on English is able to improve the zero-shot performance on all 14 test sets covering 8 languages, with up to 3.8 F1 / 5.2 EM improvement on-average, and 5.2 F1 / 11.2 EM on certain languages. Further analysis shows the improvement can be attributed to the attention on the cross-linguistically consistent syntactic path.
翻译:我们的目标是在直接零点设置中将跨语言机器阅读综合理解(MRC)的任务定在直接零点设置中,方法是纳入普遍依赖(UD)的综合特征,而我们使用的关键特征是每个句内的综合关系。虽然以前的工作已经展示了有效的合成引导MRC模型,但我们提议除了采用最基本的内部依赖路径外,采用语际综合关系,以进一步利用MRC任务多语种输入中的综合依赖性。在我们的方法中,我们建立了将依赖性树连接起来以形成全球各句间综合关系的跨行路径图(ISDG),然后,我们提议将全球依赖性图编码为编码的ISDG编码器,通过一呼和多呼依赖性路径明确处理语际关系。在三个多语种MRC数据集(XQUAD、MLQA、TyDIQA-GoldP)上进行实验,显示我们仅通过英语培训来改进双向路径的CEVE/EM 1 测试显示,对F-EM 1 级改进的所有语言都可改进。