This article presents the application of the Universal Named Entity framework to generate automatically annotated corpora. By using a workflow that extracts Wikipedia data and meta-data and DBpedia information, we generated an English dataset which is described and evaluated. Furthermore, we conducted a set of experiments to improve the annotations in terms of precision, recall, and F1-measure. The final dataset is available and the established workflow can be applied to any language with existing Wikipedia and DBpedia. As part of future research, we intend to continue improving the annotation process and extend it to other languages.
翻译:本篇文章介绍了应用通用命名实体框架自动生成附加说明的子公司。我们利用提取维基百科数据、元数据和DBpedia信息的工作流程,生成了一个描述和评价的英文数据集。此外,我们进行了一系列实验,以改进说明的精确度、回溯度和F1度量。有最终数据集可用,既有工作流程可用于现有的维基百科和DBpedia的任何语文。作为未来研究的一部分,我们打算继续改进批注过程,并将其推广到其他语文。