重新思考注释:语言学习者能作出贡献吗? (Rethinking Annotation: Can Language Learners Contribute?) - 专知论文

会员服务 ·

0

学习器 · Analysis · Performer · 词表 · 情感分析 ·

2022 年 10 月 13 日

Rethinking Annotation: Can Language Learners Contribute?

翻译：重新思考注释:语言学习者能作出贡献吗?

Haneul Yoo,Rifki Afina Putri,Changyoon Lee,Youngin Lee,So-Yeon Ahn,Dongyeop Kang,Alice Oh

Researchers have traditionally recruited native speakers to provide annotations for the widely used benchmark datasets. But there are languages for which recruiting native speakers is difficult, and it would help to get learners of those languages to annotate the data. In this paper, we investigate whether language learners can contribute annotations to the benchmark datasets. In a carefully controlled annotation experiment, we recruit 36 language learners, provide two types of additional resources (dictionaries and machine-translated sentences), and perform mini-tests to measure their language proficiency. We target three languages, English, Korean, and Indonesian, and four NLP tasks, sentiment analysis, natural language inference, named entity recognition, and machine reading comprehension. We find that language learners, especially those with intermediate or advanced language proficiency, are able to provide fairly accurate labels with the help of additional resources. Moreover, we show that data annotation improves learners' language proficiency in terms of vocabulary and grammar. The implication of our findings is that broadening the annotation task to include language learners can open up the opportunity to build benchmark datasets for languages for which it is difficult to recruit native speakers.

翻译：传统上,研究人员聘用本地语言者为广泛使用的基准数据集提供说明。但有些语言很难招聘本地语言者,有助于让这些语言的学习者对数据作出说明。在本文中,我们调查语言学习者能否为基准数据集提供说明。在仔细控制的批注实验中,我们征聘了36名语言学习者,提供了两种额外的资源(词典和翻译的句子),并进行了衡量语言熟练程度的小型测试。我们针对三种语言,即英语、韩语和印尼语,以及四种国家语言方案任务、情绪分析、自然语言推论、名称实体识别和机器阅读理解。我们发现语言学习者,特别是具有中级或高级语言熟练程度的语言学习者,能够在额外资源的帮助下提供相当准确的标签。此外,我们显示,数据注解提高了学习者在词汇和语法方面语言熟练程度。我们发现,扩大说明任务范围,将语言学习者包括在内,可以打开为难以招聘母语的语言建立基准数据集的机会。

0

相关内容

学习器

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

microRNA-708及其调控Notch信号通路介导的血管生成在新生儿支气管肺发育不良中的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

活化的PLC-γ及与Akt关联调控OA软骨基质代谢的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

ING3：原发性肝癌的诊断与治疗新靶点

国家自然科学基金

0+阅读 · 2012年12月31日

负性调控骨髓间充质干细胞的FAPα- - 增强多发性骨髓瘤疫苗抗瘤效应的新策略

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

基于异金属配位中心的功能配合物的设计合成及其聚氨酯复合材料的构筑

国家自然科学基金

0+阅读 · 2012年12月31日

巨磁致伸缩材料中磁机械效应和磁致伸缩

国家自然科学基金

0+阅读 · 2012年12月31日

Egr-1反馈调节Ras促进CCL20/CCR6诱发的大肠癌细胞EMT

国家自然科学基金

0+阅读 · 2009年12月31日

基因修饰的内皮祖细胞靶向治疗HER-2阳性肿瘤的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

多孔介质结构效应影响T型管道冷热流体混合过程热波动特性的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Planning with Large Language Models via Corrective Re-prompting

Arxiv

0+阅读 · 2022年11月17日

Text to Image Generation: Leaving no Language Behind

Arxiv

0+阅读 · 2022年11月17日

Low-Resource Mongolian Speech Synthesis Based on Automatic Prosody Annotation

Arxiv

0+阅读 · 2022年11月17日

Emb-GAM: an Interpretable and Efficient Predictor using Pre-trained Language Models

Arxiv

0+阅读 · 2022年11月15日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

27+阅读 · 2021年6月16日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

A Survey on Deep Learning for Named Entity Recognition

A Survey on Deep Learning for Named Entity Recognition

Arxiv

26+阅读 · 2020年3月13日

VIP会员

文章信息

相关主题

相关VIP内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】行动，规划与学习，622页pdf

美军坦克部队反无人机新策略：主炮轰击方案

【ICML2025】免费的Fisher？通过回收平方梯度累加器近似Fisher信息矩阵

数据质量维度的实践展开：一项综述

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Planning with Large Language Models via Corrective Re-prompting

Arxiv

0+阅读 · 2022年11月17日

Text to Image Generation: Leaving no Language Behind

Arxiv

0+阅读 · 2022年11月17日

Low-Resource Mongolian Speech Synthesis Based on Automatic Prosody Annotation

Arxiv

0+阅读 · 2022年11月17日

Emb-GAM: an Interpretable and Efficient Predictor using Pre-trained Language Models

Arxiv

0+阅读 · 2022年11月15日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

27+阅读 · 2021年6月16日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

A Survey on Deep Learning for Named Entity Recognition

A Survey on Deep Learning for Named Entity Recognition

Arxiv

26+阅读 · 2020年3月13日

相关基金

microRNA-708及其调控Notch信号通路介导的血管生成在新生儿支气管肺发育不良中的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

活化的PLC-γ及与Akt关联调控OA软骨基质代谢的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

ING3：原发性肝癌的诊断与治疗新靶点

国家自然科学基金

0+阅读 · 2012年12月31日

负性调控骨髓间充质干细胞的FAPα- - 增强多发性骨髓瘤疫苗抗瘤效应的新策略

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

基于异金属配位中心的功能配合物的设计合成及其聚氨酯复合材料的构筑

国家自然科学基金

0+阅读 · 2012年12月31日

巨磁致伸缩材料中磁机械效应和磁致伸缩

国家自然科学基金

0+阅读 · 2012年12月31日

Egr-1反馈调节Ras促进CCL20/CCR6诱发的大肠癌细胞EMT

国家自然科学基金

0+阅读 · 2009年12月31日

基因修饰的内皮祖细胞靶向治疗HER-2阳性肿瘤的实验研究

国家自然科学基金

0+阅读 · 2009年12月31日

多孔介质结构效应影响T型管道冷热流体混合过程热波动特性的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员