We present our contribution to the EvaLatin shared task, which is the first evaluation campaign devoted to the evaluation of NLP tools for Latin. We submitted a system based on UDPipe 2.0, one of the winners of the CoNLL 2018 Shared Task, The 2018 Shared Task on Extrinsic Parser Evaluation and SIGMORPHON 2019 Shared Task. Our system places first by a wide margin both in lemmatization and POS tagging in the open modality, where additional supervised data is allowed, in which case we utilize all Universal Dependency Latin treebanks. In the closed modality, where only the EvaLatin training data is allowed, our system achieves the best performance in lemmatization and in classical subtask of POS tagging, while reaching second place in cross-genre and cross-time settings. In the ablation experiments, we also evaluate the influence of BERT and XLM-RoBERTa contextualized embeddings, and the treebank encodings of the different flavors of Latin treebanks.
翻译:我们向Eva Lateran 共同任务展示了我们的贡献,这是用于评价拉丁语NLP工具的首个评价运动。我们提交了一个基于UDPipe 2.0的系统,这是2018年CNLL2018共同任务、2018年Extrinsic Passer 评估共同任务和SIGMORPHON 2019年共享任务的一个获奖者之一。我们的系统在乳液化和POS标记的开放模式中占有很大的优势,允许额外数据,在这种情况下,我们使用所有通用的拉丁依赖性树库。在封闭模式中,只允许Eva Latin 培训数据,我们的系统在POS标记的乳液化和经典子塔级中取得了最佳的性能,同时达到了跨基因和跨时间环境的第二位。在关系实验中,我们还评估了BERT和XLM-ROBERTA 的背景嵌入点的影响,以及拉丁美洲树库不同口的树库编码。