Named entity recognition (NER) is a fundamental part of extracting information from documents in biomedical applications. A notable advantage of NER is its consistency in extracting biomedical entities in a document context. Although existing document NER models show consistent predictions, they still do not meet our expectations. We investigated whether the adjectives and prepositions within an entity cause a low label consistency, which results in inconsistent predictions. In this paper, we present our method, ConNER, which enhances the label dependency of modifiers (e.g., adjectives and prepositions) to achieve higher label agreement. ConNER refines the draft labels of the modifiers to improve the output representations of biomedical entities. The effectiveness of our method is demonstrated on four popular biomedical NER datasets; in particular, its efficacy is proved on two datasets with 7.5-8.6% absolute improvements in the F1 score. We interpret that our ConNER method is effective on datasets that have intrinsically low label consistency. In the qualitative analysis, we demonstrate how our approach makes the NER model generate consistent predictions. Our code and resources are available at https://github.com/dmis-lab/ConNER/.
翻译:从生物医学应用文件中提取信息的基本部分是命名实体识别(NER),这是从生物医学应用文件中提取信息的一个基本部分。NER的一个显著优势,就是在从文件中提取生物医学实体时的一致性。尽管现有的文件净化模型显示的是一致的预测,但它们仍然不符合我们的期望。我们调查了一个实体内的形容词和预设是否导致标签一致性低,从而导致预测不一致。我们在本文件中介绍了我们的方法,即ConNER,它加强了修饰者(例如形容词和预设位置)的标签依赖性,以达成更高的标签协议。ConNER改进了修饰者的标签草案,以改善生物医学实体的产出表现。我们的方法的有效性表现在四个流行的生物医学净化数据集上;特别是,它的效力在两个数据集中得到证明,F1分的绝对改进率为7.5-8.6%。我们解释说,我们的ConNER方法对于具有内在低标签一致性的数据集是有效的。在定性分析中,我们的方法表明我们如何使NER模型产生一致的预测。我们的代码和资源可以在 https://github.com/mislab.