Medical entity span extraction and linking are critical steps for many healthcare NLP tasks. Most existing entity extraction methods either have a fixed vocabulary of medical entities or require span annotations. In this paper, we propose a method for linking an open set of entities that does not require any span annotations. Our method, Open Set Label Attention Transformer (OSLAT), uses the label-attention mechanism to learn candidate-entity contextualized text representations. We find that OSLAT can not only link entities but is also able to implicitly learn spans associated with entities. We evaluate OSLAT on two tasks: (1) span extraction trained without explicit span annotations, and (2) entity linking trained without span-level annotation. We test the generalizability of our method by training two separate models on two datasets with low entity overlap and comparing cross-dataset performance.
翻译:医疗实体的抽取和连接是许多国家医疗计划任务的关键步骤。大多数现有实体抽取方法要么有固定的医疗实体词汇,要么需要跨度说明。在本文件中,我们建议了将一组不要求任何跨度说明的开放实体连接起来的方法。我们的方法是开放标签标签关注变换器(OSLAT),使用标签关注机制来学习候选实体的背景文字说明。我们发现OSLAT不仅可以将各实体链接起来,还可以隐含地学习与实体相关的范围。我们评估了OSLAT的两项任务:(1) 覆盖经过培训的抽取过程,没有明确的跨度说明;和(2) 将受过培训的实体连接起来,没有跨度注解。我们测试我们方法的通用性,方法是在实体重叠程度低的两个数据集上培训两个不同的模型,比较交叉数据集的性能。