Recent named entity recognition (NER) models often rely on human-annotated datasets requiring the vast engagement of professional knowledge on the target domain and entities. This work introduces an ask-to-generate approach, which automatically generates NER datasets by asking simple natural language questions to an open-domain question answering system (e.g., "Which disease?"). Despite using fewer training resources, our models solely trained on the generated datasets largely outperform strong low-resource models by 20.8 F1 score on average across six popular NER benchmarks. Our models also show competitive performance with rich-resource models that additionally leverage in-domain dictionaries provided by domain experts. In few-shot NER, we outperform the previous best model by 5.2 F1 score on three benchmarks and achieve new state-of-the-art performance.
翻译:最近命名的实体识别(NER)模式往往依赖需要广泛参与目标领域和实体方面专业知识的人类附加说明数据集。这项工作引入了问与源方法,通过向开放域问题回答系统(例如“什么疾病 ” ) 询问简单的自然语言问题自动生成NER数据集。 尽管使用的培训资源较少,但我们仅就生成的数据集接受过培训的模型在六个广受欢迎的NER基准中平均超过20.8 F1分的强力低资源模型。 我们的模型还展示了富资源模型的竞争性性能,这些模型在域专家提供的域域域词中也起到额外的杠杆作用。 在少数发光的NER中,我们通过5.2 F1在三个基准上比以往的最佳模型高,并实现了新的最新业绩。