命名实体识别(NER)(也称为实体标识,实体组块和实体提取)是信息抽取的子任务,旨在将非结构化文本中提到的命名实体定位和分类为预定义类别,例如人员姓名、地名、机构名、专有名词等。

知识荟萃

命名实体识别 Named Entity Recognition 专知荟萃

综述

  1. Jing Li, Aixin Sun,Jianglei Han, Chenliang Li

  2. A Review of Named Entity Recognition (NER) Using Automatic Summarization of Resumes

模型算法

  1. LSTM + CRF中的NCRF++算法: Design Challenges and Misconceptions in Neural Sequence Labeling.COLLING 2018.

  2. CNN+CRF:

  3. BERT+(LSTM)+CRF:

入门学习

  1. NLP之CRF应用篇(序列标注任务)( CRF++的详细解析、Bi-LSTM+CRF中CRF层的详细解析、Bi-LSTM后加CRF的原因、CRF和Bi-LSTM+CRF优化目标的区别) )

  2. Bilstm+CRF中的CRF详解

  3. Bilstm-CRF中的CRF层解析-2

  4. Bilstm-CRF中的CRF层解析-3

  5. CRF和LSTM模型在序列标注上的优劣?

  6. CRF和LSTM的比较

  7. 入门参考:命名实体识别(NER)的二三事

  8. 基础却不简单,命名实体识别的难点与现状

  9. 通俗理解BiLSTM-CRF命名实体识别模型中的CRF层

重要报告

Tutorial

​1.(pyToech)高级:制定动态决策和BI-LSTM CRF(Advanced: Making Dynamic Decisions and the Bi-LSTM CRF) - [https://pytorch.org/tutorials/beginner/nlp/advanced_tutorial.html]

代码

​1.中文命名实体识别(包括多种模型:HMM,CRF,BiLSTM,BiLSTM+CRF的具体实现)

  - [https://github.com/luopeixiang/named_entity_recognition]

领域专家

1.华为-诺亚方舟 - 李航 []

2.美国伊利诺伊大学 - 韩家炜 [https://hanj.cs.illinois.edu/]

命名实体识别工具

  1. Stanford NER
  2. MALLET
  3. Hanlp
  4. NLTK
  5. spaCy
  6. Ohio State University Twitter NER

###相关数据集

  1. CCKS2017 开放的中文的电子病例测评相关的数据。 评测任务一:

  2. CCKS2018 开放的音乐领域的实体识别任务。

评测任务:

  - [https://biendata.com/competition/CCKS2018_2/]
  1. NLPCC2018 开放的任务型对话系统中的口语理解评测。

CoNLL 2003

https://www.clips.uantwerpen.be/conll2003/ner/

进阶论文

1999

2005

2006

2008

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

VIP内容

摘要: 在自然语言处理领域,信息抽取一直以来受到人们的关注.信息抽取主要包括3项子任务:实体抽取、关系抽取和事件抽取,而关系抽取是信息抽取领域的核心任务和重要环节.实体关系抽取的主要目标是从自然语言文本中识别并判定实体对之间存在的特定关系,这为智能检索、语义分析等提供了基础支持,有助于提高搜索效率,促进知识库的自动构建.综合阐述了实体关系抽取的发展历史,介绍了常用的中文和英文关系抽取工具和评价体系.主要从4个方面展开介绍了实体关系抽取方法,包括:早期的传统关系抽取方法、基于传统机器学习、基于深度学习和基于开放领域的关系抽取方法,总结了在不同历史阶段的主流研究方法以及相应的代表性成果,并对各种实体关系抽取技术进行对比分析.最后,对实体关系抽取的未来重点研究内容和发展趋势进行了总结和展望.

http://crad.ict.ac.cn/CN/10.7544/issn1000-1239.2020.20190358#1

成为VIP会员查看完整内容
0
65

最新内容

The current COVID-19 pandemic has lead to the creation of many corpora that facilitate NLP research and downstream applications to help fight the pandemic. However, most of these corpora are exclusively for English. As the pandemic is a global problem, it is worth creating COVID-19 related datasets for languages other than English. In this paper, we present the first manually-annotated COVID-19 domain-specific dataset for Vietnamese. Particularly, our dataset is annotated for the named entity recognition (NER) task with newly-defined entity types that can be used in other future epidemics. Our dataset also contains the largest number of entities compared to existing Vietnamese NER datasets. We empirically conduct experiments using strong baselines on our dataset, and find that: automatic Vietnamese word segmentation helps improve the NER results and the highest performances are obtained by fine-tuning pre-trained language models where the monolingual model PhoBERT for Vietnamese (Nguyen and Nguyen, 2020) produces higher results than the multilingual model XLM-R (Conneau et al., 2020). We publicly release our dataset at: https://github.com/VinAIResearch/PhoNER_COVID19

0
0
下载
预览

最新论文

The current COVID-19 pandemic has lead to the creation of many corpora that facilitate NLP research and downstream applications to help fight the pandemic. However, most of these corpora are exclusively for English. As the pandemic is a global problem, it is worth creating COVID-19 related datasets for languages other than English. In this paper, we present the first manually-annotated COVID-19 domain-specific dataset for Vietnamese. Particularly, our dataset is annotated for the named entity recognition (NER) task with newly-defined entity types that can be used in other future epidemics. Our dataset also contains the largest number of entities compared to existing Vietnamese NER datasets. We empirically conduct experiments using strong baselines on our dataset, and find that: automatic Vietnamese word segmentation helps improve the NER results and the highest performances are obtained by fine-tuning pre-trained language models where the monolingual model PhoBERT for Vietnamese (Nguyen and Nguyen, 2020) produces higher results than the multilingual model XLM-R (Conneau et al., 2020). We publicly release our dataset at: https://github.com/VinAIResearch/PhoNER_COVID19

0
0
下载
预览
Top