Named entity recognition (NER) is the task to identify text spans that mention named entities, and to classify them into predefined categories such as person, location, organization etc. NER serves as the basis for a variety of natural language applications such as question answering, text summarization, and machine translation. Although early NER systems are successful in producing decent recognition accuracy, they often require much human effort in carefully designing rules or features. In recent years, deep learning, empowered by continuous real-valued vector representations and semantic composition through nonlinear processing, has been employed in NER systems, yielding stat-of-the-art performance. In this paper, we provide a comprehensive review on existing deep learning techniques for NER. We first introduce NER resources, including tagged NER corpora and off-the-shelf NER tools. Then, we systematically categorize existing works based on a taxonomy along three axes: distributed representations for input, context encoder, and tag decoder. Next, we survey the most representative methods for recent applied techniques of deep learning in new NER problem settings and applications. Finally, we present readers with the challenges faced by NER systems and outline future directions in this area.
翻译:命名实体识别(NER)的任务是确定提及被命名实体的文字,并将其分为预先界定的类别,如个人、地点、组织等。 NER是各种自然语言应用的基础,例如问答、文本摘要和机器翻译。虽然早期NER系统成功地产生了体面的识别准确性,但它们往往需要在仔细设计规则或特征方面作出大量的人性努力。近年来,在通过非线性处理持续真实价值的矢量表达和语义组成增强的深度学习中,在NER系统中采用了不断真实价值的矢量表达和语义组成,产生了最新的艺术性能。在本文件中,我们全面审查了现有的NER的深层学习技术。我们首先引入了NER资源,包括贴有标签的NER公司和现成的NER工具。然后,我们系统地将现有工作分类为三个轴心:对投入、环境编码编码和标记解码器的分布式表达。接下来,我们调查了在新的NER问题设置和应用中最近应用的深层学习技术的最有代表性的方法。最后,我们介绍了新的NER系统中未来面临的挑战。