Nested entities are observed in many domains due to their compositionality, which cannot be easily recognized by the widely-used sequence labeling framework. A natural solution is to treat the task as a span classification problem. To increase performance on span representation and classification, it is crucial to effectively integrate all useful information of different formats, which we refer to heterogeneous factors including tokens, labels, boundaries, and related spans. To fuse these heterogeneous factors, we propose a novel triaffine mechanism including triaffine attention and scoring, which interacts with multiple factors in both the stages of representation and classification. Experiments results show that our proposed method achieves the state-of-the-art F1 scores on four nested NER datasets: ACE2004, ACE2005, GENIA, and KBP2017.
翻译:在许多领域,由于组成性,并不容易被广泛使用的序列标签框架所识别,因此可以观察到括号实体。自然的解决办法是将任务视为一个跨范围分类问题。为了提高跨范围和分类的性能,必须有效地整合不同格式的所有有用信息,我们指的是各种因素,包括象征物、标签、边界和相关范围。为了融合这些差异性因素,我们提议建立一个新型的三角机制,包括三角关注和评分,在代表性和分类两个阶段都与多种因素相互作用。实验结果显示,我们拟议的方法在四个嵌套的NER数据集(ACE2004、ACE2005、GENIA和KBP2017)上达到了最先进的F1分数。