In this paper, we present NEREL, a Russian dataset for named entity recognition and relation extraction. NEREL is significantly larger than existing Russian datasets: to date it contains 56K annotated named entities and 39K annotated relations. Its important difference from previous datasets is annotation of nested named entities, as well as relations within nested entities and at the discourse level. NEREL can facilitate development of novel models that can extract relations between nested named entities, as well as relations on both sentence and document levels. NEREL also contains the annotation of events involving named entities and their roles in the events. The NEREL collection is available via https://github.com/nerel-ds/NEREL.
翻译:在本文中,我们介绍俄罗斯名称实体识别和关系提取数据集NEREL。NEREL比现有的俄罗斯数据集大得多:迄今为止,它包含56K个附加说明的命名实体和39K个附加说明的关系。它与以往数据集的重要区别是标名实体的注解,以及嵌巢实体内部和对话层面的关系。NEREL可以促进开发能够提取标名实体之间的关系以及句子和文件层面关系的新模式。NEREL还包含涉及名称实体及其在活动中作用的事件的注解。NEREL收藏可通过https://github.com/nerel-ds/NEREL查阅。