Named entity recognition (NER) is a task of extracting named entities of specific types from text. Current NER models often rely on human-annotated datasets requiring the vast engagement of professional knowledge on the target domain and entities. This work introduces an ask-to-generate approach, which automatically generates NER datasets by asking simple natural language questions that reflect the needs for entity types (e.g., Which disease?) to an open-domain question answering system. Without using any in-domain resources (i.e., training sentences, labels, or in-domain dictionaries), our models solely trained on our generated datasets largely outperform previous weakly supervised models on six NER benchmarks across four different domains. Surprisingly, on NCBI-disease, our model achieves 75.5 F1 score and even outperforms the previous best weakly supervised model by 4.1 F1 score, which utilizes a rich in-domain dictionary provided by domain experts. Formulating the needs of NER with natural language also allows us to build NER models for fine-grained entity types such as Award, where our model even outperforms fully supervised models. On three few-shot NER benchmarks, our model achieves new state-of-the-art performance.
翻译:命名实体识别(NER)是一项从文本中提取特定类型名称实体的任务。当前的 NER 模型往往依赖需要广泛参与目标领域和实体方面专业知识的人类附加说明的数据集。这项工作引入了一种问与源的方法,通过询问简单的自然语言问题自动生成NER数据集,反映实体类型(例如,哪种疾病?)的需求,将其变为开放域问题解答系统。不使用任何内部资源(例如,培训句号、标签或内部字典),我们仅对生成的数据集进行专门培训的模型基本上超过了先前在四个不同领域六种受监管的模型。令人惊讶的是,在NCABI问题方面,我们的模型达到了75.5 F1分,甚至超过了以前最薄弱的监管模式,即4.1 F1分,这使用了由域专家提供的丰富的内部词典。用自然语言描述NER的需求,也使我们能够为精准的少数实体类型模型建立NER模型,在三个不同领域建立了完全监督的模型。