建筑使用非语音说明的低资源NER模型 (Building Low-Resource NER Models Using Non-Speaker Annotation) - 专知论文

会员服务 ·

0

命名实体识别 · entity · MoDELS · Notability · Performer ·

2021 年 4 月 26 日

Building Low-Resource NER Models Using Non-Speaker Annotation

翻译：建筑使用非语音说明的低资源NER模型

Tatiana Tsygankova,Francesca Marini,Stephen Mayhew,Dan Roth

from arxiv, Accepted to DASH-LA 2021, workshop at NAACL 2021

In low-resource natural language processing (NLP), the key problems are a lack of target language training data, and a lack of native speakers to create it. Cross-lingual methods have had notable success in addressing these concerns, but in certain common circumstances, such as insufficient pre-training corpora or languages far from the source language, their performance suffers. In this work we propose a complementary approach to building low-resource Named Entity Recognition (NER) models using ``non-speaker'' (NS) annotations, provided by annotators with no prior experience in the target language. We recruit 30 participants in a carefully controlled annotation experiment with Indonesian, Russian, and Hindi. We show that use of NS annotators produces results that are consistently on par or better than cross-lingual methods built on modern contextual representations, and have the potential to outperform with additional effort. We conclude with observations of common annotation patterns and recommended implementation practices, and motivate how NS annotations can be used in addition to prior methods for improved performance. For more details, http://cogcomp.org/page/publication_view/941

翻译：在低资源自然语言处理(NLP)中,关键问题是缺乏目标语言培训数据,以及缺乏当地语言使用者来创建这种数据。跨语言方法在解决这些问题上取得了显著的成功,但在某些共同的情况下,例如培训前社团或远离原始语言的语言不够充分,其业绩受到影响。在这项工作中,我们建议采用补充方法,用“non-speaker''(NS)”说明来建立低资源命名实体识别模型,由在目标语言方面没有经验的注解员提供。我们征聘了30名参与者,与印度尼西亚、俄罗斯和印地语进行了仔细控制的注解试验。我们表明,使用NS说明者所产生的结果始终处于同等水平或优于建立在现代背景表述上的跨语言方法,并且有可能超越额外的努力。我们最后对通用注解模式和建议实施做法进行了观察,并激励除了先前改进绩效的方法之外如何使用NS说明。关于改进绩效的方法的更多细节,见http://cogcomp.org/page/publication_view/941。

0

相关内容

命名实体识别

命名实体识别

命名实体识别（NER）（也称为实体标识，实体组块和实体提取）是信息抽取的子任务，旨在将非结构化文本中提到的命名实体定位和分类为预定义类别，例如人员姓名、地名、机构名、专有名词等。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

【CVPR2021】半监督迁移学习的自适应一致性正则化

专知会员服务

33+阅读 · 2021年3月7日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

【CMU-TACL2020】低资源跨语言实体链接，Low-resource Crosslingual EntityLinking

专知会员服务

17+阅读 · 2020年3月29日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

33+阅读 · 2020年2月29日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【多伦多大学】神经数据服务器:用于传输学习数据的大型搜索引擎，Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data

【多伦多大学】神经数据服务器:用于传输学习数据的大型搜索引擎，Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data

专知会员服务

7+阅读 · 2020年1月9日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

已删除

将门创投

9+阅读 · 2019年11月15日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

专知

37+阅读 · 2018年2月21日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

The Multilingual TEDx Corpus for Speech Recognition and Translation

Arxiv

0+阅读 · 2021年6月15日

Leveraging Pre-trained Language Model for Speech Sentiment Analysis

Arxiv

0+阅读 · 2021年6月11日

N-Best ASR Transformer: Enhancing SLU Performance using Multiple ASR Hypotheses

N-Best ASR Transformer: Enhancing SLU Performance using Multiple ASR Hypotheses

Arxiv

0+阅读 · 2021年6月11日

Spoken Term Detection Methods for Sparse Transcription in Very Low-resource Settings

Arxiv

0+阅读 · 2021年6月11日

MBIC -- A Media Bias Annotation Dataset Including Annotator Characteristics

Arxiv

0+阅读 · 2021年5月20日

Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

Arxiv

20+阅读 · 2020年12月22日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task

Arxiv

3+阅读 · 2018年4月16日

Adversarial Learning for Chinese NER from Crowd Annotations

Arxiv

15+阅读 · 2018年1月16日

VIP会员

文章信息

相关主题

命名实体识别

相关VIP内容

【CVPR2021】半监督迁移学习的自适应一致性正则化

专知会员服务

33+阅读 · 2021年3月7日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

【CMU-TACL2020】低资源跨语言实体链接，Low-resource Crosslingual EntityLinking

专知会员服务

17+阅读 · 2020年3月29日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

【微软雷德蒙研究院】小样本自然语言生成，Few-shot Natural Language Generation for Task-Oriented Dialog

专知会员服务

33+阅读 · 2020年2月29日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【多伦多大学】神经数据服务器:用于传输学习数据的大型搜索引擎，Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data

【多伦多大学】神经数据服务器:用于传输学习数据的大型搜索引擎，Neural Data Server: A Large-Scale Search Engine for Transfer Learning Data

专知会员服务

7+阅读 · 2020年1月9日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《步兵小单元山地严寒作战指南》美军最新条令200页

《联合作战概念的发展》最新报告

俄制无人机弹药

《复杂场景下自主着陆的模型预测控制技术》92页

相关资讯

已删除

将门创投

9+阅读 · 2019年11月15日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

【论文推荐】最新五篇命名实体识别（NER）相关论文—对抗学习、语料库、深度多任务学习、先验知识、跨语言语义

专知

37+阅读 · 2018年2月21日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

相关论文

The Multilingual TEDx Corpus for Speech Recognition and Translation

Arxiv

0+阅读 · 2021年6月15日

Leveraging Pre-trained Language Model for Speech Sentiment Analysis

Arxiv

0+阅读 · 2021年6月11日

N-Best ASR Transformer: Enhancing SLU Performance using Multiple ASR Hypotheses

N-Best ASR Transformer: Enhancing SLU Performance using Multiple ASR Hypotheses

Arxiv

0+阅读 · 2021年6月11日

Spoken Term Detection Methods for Sparse Transcription in Very Low-resource Settings

Arxiv

0+阅读 · 2021年6月11日

MBIC -- A Media Bias Annotation Dataset Including Annotator Characteristics

Arxiv

0+阅读 · 2021年5月20日

Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue Generation

Arxiv

20+阅读 · 2020年12月22日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task

Arxiv

3+阅读 · 2018年4月16日

Adversarial Learning for Chinese NER from Crowd Annotations

Arxiv

15+阅读 · 2018年1月16日

微信扫码咨询专知VIP会员