In this paper, we introduce the resources that we developed for Turkish dependency parsing, which include a novel manually annotated treebank (BOUN Treebank), along with the guidelines we adopted, and a new annotation tool (BoAT). The manual annotation process we employed was shaped and implemented by a team of four linguists and five Natural Language Processing (NLP) specialists. Decisions regarding the annotation of the BOUN Treebank were made in line with the Universal Dependencies (UD) framework as well as our recent efforts for unifying the Turkish UD treebanks through manual re-annotation. To the best of our knowledge, BOUN Treebank is the largest Turkish treebank. It contains a total of 9,761 sentences from various topics including biographical texts, national newspapers, instructional texts, popular culture articles, and essays. In addition, we report the parsing results of a state-of-the-art dependency parser obtained over the BOUN Treebank as well as two other treebanks in Turkish. Our results demonstrate that the unification of the Turkish annotation scheme and the introduction of a more comprehensive treebank lead to improved performance with regard to dependency parsing.
翻译:在本文中,我们介绍了我们为土耳其依赖性分析开发的资源,其中包括一个人工手动附加说明的树库(BOUN树库)以及我们通过的指导方针,以及一个新的说明工具(BoAT)。我们使用的手册说明过程是由四个语言学家和五个自然语言处理(NLP)专家组成的团队制定和实施的。关于BOUN树库说明的决定是根据普遍依赖性框架作出的,以及我们最近通过人工重新批注来统一土耳其的UD树库的努力。据我们所知,BOUN树库是土耳其最大的树库。它共有9 761项判决,涉及各种专题,包括简历、国家报纸、教学文本、大众文化文章和论文。此外,我们报告了在BOUN树库和另外两个土耳其树库中获得的州级依赖性分析师的评分结果。我们的结果表明,土耳其的注制计划得到了统一,并引入了更加全面的树库业绩导向。