Text discourse parsing plays an important role in understanding information flow and argumentative structure in natural language. Previous research under the Rhetorical Structure Theory (RST) has mostly focused on inducing and evaluating models from the English treebank. However, the parsing tasks for other languages such as German, Dutch, and Portuguese are still challenging due to the shortage of annotated data. In this work, we investigate two approaches to establish a neural, cross-lingual discourse parser via: (1) utilizing multilingual vector representations; and (2) adopting segment-level translation of the source content. Experiment results show that both methods are effective even with limited training data, and achieve state-of-the-art performance on cross-lingual, document-level discourse parsing on all sub-tasks.
翻译:在理解信息流动和自然语言的争论结构方面,文字话语分析起着重要作用。以前在Rhetoric Structural States(RST)下的研究主要侧重于引导和评估来自英国树库的模型,然而,由于缺少附加说明的数据,其他语言如德语、荷兰语和葡萄牙语的分解任务仍然具有挑战性。在这项工作中,我们调查了两种方法,通过以下两种方式建立一个神经、跨语言的讲义分析器:(1) 使用多语言矢量表达方式;(2) 采用分部分翻译源内容。实验结果表明,两种方法即使在有限的培训数据条件下也是有效的,并且在所有子任务上实现跨语言、文件层面的谈话最先进的分级分析。