Named Entity Recognition (NER) is a low-level task often used as a foundation for solving higher level NLP problems. In the context of character detection in novels, NER false negatives can be an issue as they possibly imply missing certain characters or relationships completely. In this article, we demonstrate that applying a straightforward data augmentation technique allows training a model achieving higher recall, at the cost of a certain amount of precision regarding ambiguous entities. We show that this decrease in precision can be mitigated by giving the model more local context, which resolves some of the ambiguities.
翻译:被称为实体识别(NER)是一项低层次的任务,通常被用作解决更高层次的NLP问题的基础。在小说中,NER假底片可能意味着完全丢失某些字符或关系,因此可能是一个问题。在本篇文章中,我们证明,采用直接的数据增强技术可以培训一个能以某种程度精确地说明模糊实体为代价实现更高程度的回溯的模式。我们表明,如果给模型提供更当地的背景,从而解决某些模糊之处,那么精确度的下降是可以减轻的。