通过查询词组表达式自动创建命名实体识别数据集 (Automatic Creation of Named Entity Recognition Datasets by Querying Phrase Representations) - 专知论文

会员服务 ·

0

命名实体识别 · entity · MoDELS · 可约的 · 数据集 ·

2022 年 10 月 14 日

Automatic Creation of Named Entity Recognition Datasets by Querying Phrase Representations

翻译：通过查询词组表达式自动创建命名实体识别数据集

Hyunjae Kim,Jaehyo Yoo,Seunghyun Yoon,Jaewoo Kang

Most weakly supervised named entity recognition (NER) models rely on domain-specific dictionaries provided by experts. This approach is infeasible in many domains where dictionaries do not exist. While a phrase retrieval model was used to construct pseudo-dictionaries with entities retrieved from Wikipedia automatically in a recent study, these dictionaries often have limited coverage because the retriever is likely to retrieve popular entities rather than rare ones. In this study, a phrase embedding search to efficiently create high-coverage dictionaries is presented. Specifically, the reformulation of natural language queries into phrase representations allows the retriever to search a space densely populated with various entities. In addition, we present a novel framework, HighGEN, that generates NER datasets with high-coverage dictionaries obtained using the phrase embedding search. HighGEN generates weak labels based on the distance between the embeddings of a candidate phrase and target entity type to reduce the noise in high-coverage dictionaries. We compare HighGEN with current weakly supervised NER models on six NER benchmarks and demonstrate the superiority of our models.

翻译：最受监督最弱的命名实体识别(NER)模式依赖于专家提供的域名词典。这种方法在许多领域是行不通的。虽然在最近的一项研究中使用了一个短语检索模型,用从维基百科自动检索的实体建造假词典,但这些词典的覆盖范围往往有限,因为检索器可能检索到受欢迎的实体,而不是罕见的实体。在本研究中,介绍了一个短语嵌入搜索以高效创建高覆盖词典。具体地说,将自然语言查询改写成短语表示法,使检索器能够搜索一个空间,在多个实体中人口稠密。此外,我们提出了一个新的框架,即高GEN,用嵌入式搜索的短语生成高覆盖词典数据集。高GEN根据候选词句和目标实体类型之间的距离生成了薄弱标签,以减少高覆盖词典中的噪音。我们比较了高GEN目前在六个网域基准上受监管薄弱的NER模型,并展示了我们模型的优越性。

0

相关内容

命名实体识别

命名实体识别

命名实体识别（NER）（也称为实体标识，实体组块和实体提取）是信息抽取的子任务，旨在将非结构化文本中提到的命名实体定位和分类为预定义类别，例如人员姓名、地名、机构名、专有名词等。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

UIUC韩家炜：从海量非结构化文本中挖掘结构化知识

UIUC韩家炜：从海量非结构化文本中挖掘结构化知识

专知会员服务

98+阅读 · 2021年12月30日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

专知会员服务

61+阅读 · 2020年5月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

Wnt信号通路中β-catenin及其细胞核内共激活因子复合物的结构生物学研究

国家自然科学基金

0+阅读 · 2015年12月31日

耶尔森氏菌LcrV结合Toll样受体2的结构机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

维吾尔语命名实体间语义关系抽取理论方法研究

国家自然科学基金

1+阅读 · 2014年12月31日

D-serine在癫痫发生中的作用机制

国家自然科学基金

0+阅读 · 2013年12月31日

华支睾吸虫病致肝纤维化分泌排泄抗原新靶点果糖1,6二磷酸的CCK/Leptin途径的研究

国家自然科学基金

0+阅读 · 2013年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

Cu基类金刚石结构新型热电化合物的设计与优化

国家自然科学基金

0+阅读 · 2012年12月31日

肝星状细胞诱导生成调节T细胞在肝纤维化机制中的研究

国家自然科学基金

0+阅读 · 2012年12月31日

PKCα调控netrin-1/UNC5B信号通路促进肾癌细胞存活机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

miR-140在肿瘤转移中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

ConNER: Consistency Training for Cross-lingual Named Entity Recognition

Arxiv

0+阅读 · 2022年11月17日

Data-Efficient Autoregressive Document Retrieval for Fact Verification

Arxiv

0+阅读 · 2022年11月17日

CoupleFace: Relation Matters for Face Recognition Distillation

Arxiv

0+阅读 · 2022年11月17日

Using segment-based features of jaw movements to recognize foraging activities in grazing cattle

Arxiv

0+阅读 · 2022年11月15日

MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition

Arxiv

0+阅读 · 2022年11月15日

A Survey on Deep Learning for Named Entity Recognition

A Survey on Deep Learning for Named Entity Recognition

Arxiv

26+阅读 · 2020年3月13日

Incorporating Dictionaries into Deep Neural Networks for the Chinese Clinical Named Entity Recognition

Arxiv

12+阅读 · 2018年4月13日

Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs

Arxiv

18+阅读 · 2018年4月8日

Graph Convolutional Networks for Named Entity Recognition

Arxiv

17+阅读 · 2018年2月14日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

VIP会员

文章信息

相关主题

命名实体识别

相关VIP内容

UIUC韩家炜：从海量非结构化文本中挖掘结构化知识

UIUC韩家炜：从海量非结构化文本中挖掘结构化知识

专知会员服务

98+阅读 · 2021年12月30日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

【ACL2020】命名实体识别即依存解析，Named Entity Recognition as Dependency Parsing

专知会员服务

61+阅读 · 2020年5月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

ExBert — 可视化分析Transformer学到的表示

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《基于大型语言模型的软件工程自动化研究》最新264页

《基于大型语言模型的信号处理管线研究：推进军事电子情报工作流程》最新76页

中文版 | 战争算法：生成式人工智能在战场的崛起

中文版《美国陆军：战术行为性远程医疗实施观察与建议》

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

相关论文

ConNER: Consistency Training for Cross-lingual Named Entity Recognition

Arxiv

0+阅读 · 2022年11月17日

Data-Efficient Autoregressive Document Retrieval for Fact Verification

Arxiv

0+阅读 · 2022年11月17日

CoupleFace: Relation Matters for Face Recognition Distillation

Arxiv

0+阅读 · 2022年11月17日

Using segment-based features of jaw movements to recognize foraging activities in grazing cattle

Arxiv

0+阅读 · 2022年11月15日

MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition

Arxiv

0+阅读 · 2022年11月15日

A Survey on Deep Learning for Named Entity Recognition

A Survey on Deep Learning for Named Entity Recognition

Arxiv

26+阅读 · 2020年3月13日

Incorporating Dictionaries into Deep Neural Networks for the Chinese Clinical Named Entity Recognition

Arxiv

12+阅读 · 2018年4月13日

Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs

Arxiv

18+阅读 · 2018年4月8日

Graph Convolutional Networks for Named Entity Recognition

Arxiv

17+阅读 · 2018年2月14日

Deep Active Learning for Named Entity Recognition

Arxiv

15+阅读 · 2018年2月4日

相关基金

Wnt信号通路中β-catenin及其细胞核内共激活因子复合物的结构生物学研究

国家自然科学基金

0+阅读 · 2015年12月31日

耶尔森氏菌LcrV结合Toll样受体2的结构机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

维吾尔语命名实体间语义关系抽取理论方法研究

国家自然科学基金

1+阅读 · 2014年12月31日

D-serine在癫痫发生中的作用机制

国家自然科学基金

0+阅读 · 2013年12月31日

华支睾吸虫病致肝纤维化分泌排泄抗原新靶点果糖1,6二磷酸的CCK/Leptin途径的研究

国家自然科学基金

0+阅读 · 2013年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

Cu基类金刚石结构新型热电化合物的设计与优化

国家自然科学基金

0+阅读 · 2012年12月31日

肝星状细胞诱导生成调节T细胞在肝纤维化机制中的研究

国家自然科学基金

0+阅读 · 2012年12月31日

PKCα调控netrin-1/UNC5B信号通路促进肾癌细胞存活机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

miR-140在肿瘤转移中的作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员