使用新数据集和LSTM与GRU进行阿拉伯语命名实体识别 (Using LSTM and GRU With a New Dataset for Named Entity Recognition in the Arabic Language) - 专知论文

会员服务 ·

0

命名实体 · GRU · 实体抽取 · 长短期记忆网络 · 实体 ·

2023 年 4 月 6 日

Using LSTM and GRU With a New Dataset for Named Entity Recognition in the Arabic Language

翻译：使用新数据集和LSTM与GRU进行阿拉伯语命名实体识别

Alaa Shaker,Alaa Aldarf,Igor Bessmertny

from arxiv, Proceedings of the 13th Majorov International Conference on Software Engineering and Computer Systems

Named entity recognition (NER) is a natural language processing task (NLP), which aims to identify named entities and classify them like person, location, organization, etc. In the Arabic language, we can find a considerable size of unstructured data, and it needs to different preprocessing tool than languages like (English, Russian, German...). From this point, we can note the importance of building a new structured dataset to solve the lack of structured data. In this work, we use the BIOES format to tag the word, which allows us to handle the nested name entity that consists of more than one sentence and define the start and the end of the name. The dataset consists of more than thirty-six thousand records. In addition, this work proposes long short term memory (LSTM) units and Gated Recurrent Units (GRU) for building the named entity recognition model in the Arabic language. The models give an approximately good result (80%) because LSTM and GRU models can find the relationships between the words of the sentence. Also, use a new library from Google, which is Trax and platform Colab

翻译：命名实体识别（NER）是一项自然语言处理任务（NLP），旨在识别命名实体并将其分类为人物，地点，组织等。在阿拉伯语中，我们可以发现大量的非结构化数据，它需要不同于（英语，俄语，德语等）语言的预处理工具。从这一点可以看出，建立一个新的结构化数据集来解决结构化数据缺失问题的重要性。在这项工作中，我们使用BIOES格式标记单词，这使我们能够处理由多个句子组成的嵌套命名实体并定义名称的开头和结尾。该数据集包括超过三万六千条记录。此外，本论文提出使用LSTM单元和门控循环单元（GRU）构建阿拉伯语命名实体识别模型。模型可以获得近似良好的结果（80%），因为LSTM和GRU模型可以发现句子中的单词之间的关系。此外，使用Google的新库Trax和Colab平台。

0

相关内容

命名实体

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

专知会员服务

61+阅读 · 2020年5月15日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

专知会员服务

52+阅读 · 2019年12月28日

【AAAI2020论文-清华大学】Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources，最小资源增强的元学习跨语言命名实体识别

【AAAI2020论文-清华大学】Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources，最小资源增强的元学习跨语言命名实体识别

专知会员服务

31+阅读 · 2019年11月17日

【O'Reilly TensorFlow World 2019】使用transformer架构的自然语言处理（Natural language processing using transformer architectures），Kiwisoft的机器学习顾问Aurelien Geron

【O'Reilly TensorFlow World 2019】使用transformer架构的自然语言处理（Natural language processing using transformer architectures），Kiwisoft的机器学习顾问Aurelien Geron

专知会员服务

17+阅读 · 2019年11月14日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

NLP - 基于 BERT 的中文命名实体识别（NER)

NLP - 基于 BERT 的中文命名实体识别（NER)

AINLP

466+阅读 · 2019年2月10日

博客 | 关于SLU（意图识别、槽填充、上下文LU、结构化LU）和NLG的论文汇总

博客 | 关于SLU（意图识别、槽填充、上下文LU、结构化LU）和NLG的论文汇总

AI研习社

18+阅读 · 2018年11月30日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

基于Lattice LSTM的命名实体识别

基于Lattice LSTM的命名实体识别

微信AI

48+阅读 · 2018年10月19日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

专知

26+阅读 · 2018年5月22日

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

专知

37+阅读 · 2018年2月21日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

维吾尔语命名实体间语义关系抽取理论方法研究

国家自然科学基金

1+阅读 · 2014年12月31日

柬埔寨语命名实体识别及汉柬双语可比语料库构建方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

跨语言信息检索中的机器翻译研究

国家自然科学基金

2+阅读 · 2011年12月31日

利用美国JLab CLAS eg3实验对反应gamma n ->P pi+pi-pi-的研究

国家自然科学基金

0+阅读 · 2011年12月31日

槲皮素增敏阿霉素抗白血病作用及减轻其心肌毒性的实验研究

国家自然科学基金

0+阅读 · 2010年12月31日

基于本体的深层网络数据集成方法研究

国家自然科学基金

2+阅读 · 2009年12月31日

中文句法分析与语义角色标注的联合学习机制研究

国家自然科学基金

1+阅读 · 2009年12月31日

非MHC限制性γ948;T细胞激活在慢性HBV感染中的作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

问答式信息检索中信息抽取技术研究

国家自然科学基金

3+阅读 · 2008年12月31日

EPO抑制创伤性脑水肿的分子机制

国家自然科学基金

0+阅读 · 2008年12月31日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

CAN-NER: Convolutional Attention Network forChinese Named Entity Recognition

Arxiv

16+阅读 · 2019年4月3日

A Survey on Deep Learning for Named Entity Recognition

A Survey on Deep Learning for Named Entity Recognition

Arxiv

73+阅读 · 2018年12月22日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

Strong Baselines for Simple Question Answering over Knowledge Graphs with and without Neural Networks

Arxiv

17+阅读 · 2018年6月5日

Chinese NER Using Lattice LSTM

Arxiv

14+阅读 · 2018年5月15日

Incorporating Dictionaries into Deep Neural Networks for the Chinese Clinical Named Entity Recognition

Arxiv

12+阅读 · 2018年4月13日

Graph Convolutional Networks for Named Entity Recognition

Arxiv

17+阅读 · 2018年2月14日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

VIP会员

文章信息

相关主题

长短期记忆网络

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

专知会员服务

61+阅读 · 2020年5月15日

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

【2020新书】自然语言处理Python与spaCy实践，216页pdf，NLP with Python

专知会员服务

108+阅读 · 2020年5月1日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

专知会员服务

52+阅读 · 2019年12月28日

【AAAI2020论文-清华大学】Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources，最小资源增强的元学习跨语言命名实体识别

【AAAI2020论文-清华大学】Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources，最小资源增强的元学习跨语言命名实体识别

专知会员服务

31+阅读 · 2019年11月17日

【O'Reilly TensorFlow World 2019】使用transformer架构的自然语言处理（Natural language processing using transformer architectures），Kiwisoft的机器学习顾问Aurelien Geron

【O'Reilly TensorFlow World 2019】使用transformer架构的自然语言处理（Natural language processing using transformer architectures），Kiwisoft的机器学习顾问Aurelien Geron

专知会员服务

17+阅读 · 2019年11月14日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《英国智库：瓦解俄罗斯防空系统生产，夺回制空权》最新报告

【伯克利博士论文】从推理服务到训练：面向大规模 LLM 智能体的高效系统

《战术突击工具包：军队的“边缘”操作系统》报告

《认知战的历史视角：从冷战心理战行动到AI驱动的信息战》最新报告

相关资讯

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

NLP - 基于 BERT 的中文命名实体识别（NER)

NLP - 基于 BERT 的中文命名实体识别（NER)

AINLP

466+阅读 · 2019年2月10日

博客 | 关于SLU（意图识别、槽填充、上下文LU、结构化LU）和NLG的论文汇总

博客 | 关于SLU（意图识别、槽填充、上下文LU、结构化LU）和NLG的论文汇总

AI研习社

18+阅读 · 2018年11月30日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

基于Lattice LSTM的命名实体识别

基于Lattice LSTM的命名实体识别

微信AI

48+阅读 · 2018年10月19日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

专知

26+阅读 · 2018年5月22日

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

专知

37+阅读 · 2018年2月21日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

相关论文

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

CAN-NER: Convolutional Attention Network forChinese Named Entity Recognition

Arxiv

16+阅读 · 2019年4月3日

A Survey on Deep Learning for Named Entity Recognition

A Survey on Deep Learning for Named Entity Recognition

Arxiv

73+阅读 · 2018年12月22日

Multilingual Sentiment Analysis: An RNN-Based Framework for Limited Data

Arxiv

12+阅读 · 2018年6月8日

Strong Baselines for Simple Question Answering over Knowledge Graphs with and without Neural Networks

Arxiv

17+阅读 · 2018年6月5日

Chinese NER Using Lattice LSTM

Arxiv

14+阅读 · 2018年5月15日

Incorporating Dictionaries into Deep Neural Networks for the Chinese Clinical Named Entity Recognition

Arxiv

12+阅读 · 2018年4月13日

Graph Convolutional Networks for Named Entity Recognition

Arxiv

17+阅读 · 2018年2月14日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

相关基金

维吾尔语命名实体间语义关系抽取理论方法研究

国家自然科学基金

1+阅读 · 2014年12月31日

柬埔寨语命名实体识别及汉柬双语可比语料库构建方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

跨语言信息检索中的机器翻译研究

国家自然科学基金

2+阅读 · 2011年12月31日

利用美国JLab CLAS eg3实验对反应gamma n ->P pi+pi-pi-的研究

国家自然科学基金

0+阅读 · 2011年12月31日

槲皮素增敏阿霉素抗白血病作用及减轻其心肌毒性的实验研究

国家自然科学基金

0+阅读 · 2010年12月31日

基于本体的深层网络数据集成方法研究

国家自然科学基金

2+阅读 · 2009年12月31日

中文句法分析与语义角色标注的联合学习机制研究

国家自然科学基金

1+阅读 · 2009年12月31日

非MHC限制性γ948;T细胞激活在慢性HBV感染中的作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

问答式信息检索中信息抽取技术研究

国家自然科学基金

3+阅读 · 2008年12月31日

EPO抑制创伤性脑水肿的分子机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员