Text discourse parsing weighs importantly in understanding information flow and argumentative structure in natural language, making it beneficial for downstream tasks. While previous work significantly improves the performance of RST discourse parsing, they are not readily applicable to practical use cases: (1) EDU segmentation is not integrated into most existing tree parsing frameworks, thus it is not straightforward to apply such models on newly-coming data. (2) Most parsers cannot be used in multilingual scenarios, because they are developed only in English. (3) Parsers trained from single-domain treebanks do not generalize well on out-of-domain inputs. In this work, we propose a document-level multilingual RST discourse parsing framework, which conducts EDU segmentation and discourse tree parsing jointly. Moreover, we propose a cross-translation augmentation strategy to enable the framework to support multilingual parsing and improve its domain generality. Experimental results show that our model achieves state-of-the-art performance on document-level multilingual RST parsing in all sub-tasks.
翻译:在理解信息流动和自然语言的争论结构方面,文字讨论在理解自然语言的信息流动和辩论结构方面举足轻重,从而有利于下游任务。虽然先前的工作大大改善了RST对话的绩效,但并不易于适用于实际使用案例:(1) 将EDU分割法纳入大多数现有的树分割框架,因此,在即将到来的数据中应用这种模型并非直截了当。(2) 多数意见讨论法无法用于多语种设想方案,因为它们只用英语开发。(3) 单田树库培训的Parsers没有很好地概括外部投入。在这项工作中,我们提出了一个文件级的多语种RST对话分析框架,共同进行EDU分割法和讨论树分割法。此外,我们提出了一个交叉转换增强战略,使框架能够支持多语种分割和改进其域一般性。实验结果表明,我们的模型在所有子任务中实现了文件级多语种的RST分类方面的最新表现。