This document describes the annotation guidelines used to construct the Turku Paraphrase Corpus. These guidelines were developed together with the corpus annotation, revising and extending the guidelines regularly during the annotation work. Our paraphrase annotation scheme uses the base scale 1-4, where labels 1 and 2 are used for negative candidates (not paraphrases), while labels 3 and 4 are paraphrases at least in the given context if not everywhere. In addition to base labeling, the scheme is enriched with additional subcategories (flags) for categorizing different types of paraphrases inside the two positive labels, making the annotation scheme suitable for more fine-grained paraphrase categorization. The annotation scheme is used to annotate over 100,000 Finnish paraphrase pairs.
翻译:本文件介绍了用于构建图尔库·帕拉斯潘·科普斯的注解指南。这些指南是在注解工作期间与文体注解、定期修订和扩展指南一起制定的。我们的注解方案使用1-4基准等级表,其中标签1和2用于负数候选人(而不是副词句),而标签3和4在特定情况下至少是引言,如果不是在所有地方的话。除了基本标签之外,还添加了额外的子类(旗号),用于对两种正面标签内的不同类型的注解进行分类,使注解方案适合于更精细的注解参数分类。注计划用于注解超过10万对芬兰方的注解。