We describe the DCU-EPFL submission to the IWPT 2021 Shared Task on Parsing into Enhanced Universal Dependencies. The task involves parsing Enhanced UD graphs, which are an extension of the basic dependency trees designed to be more facilitative towards representing semantic structure. Evaluation is carried out on 29 treebanks in 17 languages and participants are required to parse the data from each language starting from raw strings. Our approach uses the Stanza pipeline to preprocess the text files, XLMRoBERTa to obtain contextualized token representations, and an edge-scoring and labeling model to predict the enhanced graph. Finally, we run a post-processing script to ensure all of our outputs are valid Enhanced UD graphs. Our system places 6th out of 9 participants with a coarse Enhanced Labeled Attachment Score (ELAS) of 83.57. We carry out additional post-deadline experiments which include using Trankit for pre-processing, XLM-RoBERTa-LARGE, treebank concatenation, and multitask learning between a basic and an enhanced dependency parser. All of these modifications improve our initial score and our final system has a coarse ELAS of 88.04.
翻译:我们描述了向IWPT 2021年关于分析增强普遍依赖性的共同任务提交的DCU-EPFL 提交到IWPT 2021年关于剖析到增强普遍依赖性的共同任务的文件,任务涉及分析增强的UD图,这是扩大基本依赖性树的延伸,旨在更便于代表语义结构;对29个树库进行了17种语言的评估,参与者需要从原始字符开始分析每种语言的数据;我们的方法是利用 Stanza 管道来预处理文本文件, XLMMORBERTA 获得背景化的代号表示,以及一个边缘和标签模型来预测增强的图形。最后,我们运行了一个后处理脚本,以确保我们的所有产出都是有效的增强UD图。我们的系统在9名参与者中排第6位,有83.57个粗度强化加固的附加音量计(ELS),我们进行了额外的死后实验,其中包括在预处理前使用Trankit、XLM-ROBERTA-LARGE、树库配配、和多塔-LARGE之间在基本和增强依赖性分析器之间学习。