Named entity recognition (NER) is an important task that aims to resolve universal categories of named entities, e.g., persons, locations, organizations, and times. Despite its common and viable use in many use cases, NER is barely applicable in domains where general categories are suboptimal, such as engineering or medicine. To facilitate NER of domain-specific types, we propose ANEA, an automated (named) entity annotator to assist human annotators in creating domain-specific NER corpora for German text collections when given a set of domain-specific texts. In our evaluation, we find that ANEA automatically identifies terms that best represent the texts' content, identifies groups of coherent terms, and extracts and assigns descriptive labels to these groups, i.e., annotates text datasets into the domain (named) entities.
翻译:命名实体的承认是一项重要任务,旨在解决被命名实体的普遍类别问题,例如个人、地点、组织和时间等。尽管其在许多使用案例中使用普遍和可行,但NER几乎无法适用于工程或医药等一般类别不最理想的领域。为了便利特定领域类型的净化,我们提议设立一个自动(名称)实体通知员ANEA,在给德国文本收藏提供一套特定领域的文本时,协助人类标注员为德国文本收藏创建特定领域的NER公司。我们的评价发现,ANEA自动确定最能代表文本内容的术语,确定一致术语组,摘录和为这些群体指定描述性标签,即将文本数据集注入域(名称)实体。