Illitetates社区的建筑公司代表:审查发展中国家面临的挑战和减轻战略 (Building Representative Corpora from Illiterate Communities: A Review of Challenges and Mitigation Strategies for Developing Countries) - 专知论文

会员服务 ·

0

可辨认的 · NLP · 系统设计 · 有偏 · GROUP ·

2021 年 2 月 4 日

Building Representative Corpora from Illiterate Communities: A Review of Challenges and Mitigation Strategies for Developing Countries

翻译：Illitetates社区的建筑公司代表:审查发展中国家面临的挑战和减轻战略

Stephanie Hirmer,Alycia Leonard,Josephine Tumwesige,Costanza Conforti

from arxiv, Accepted at EACL 2021

Most well-established data collection methods currently adopted in NLP depend on the assumption of speaker literacy. Consequently, the collected corpora largely fail to represent swathes of the global population, which tend to be some of the most vulnerable and marginalised people in society, and often live in rural developing areas. Such underrepresented groups are thus not only ignored when making modeling and system design decisions, but also prevented from benefiting from development outcomes achieved through data-driven NLP. This paper aims to address the under-representation of illiterate communities in NLP corpora: we identify potential biases and ethical issues that might arise when collecting data from rural communities with high illiteracy rates in Low-Income Countries, and propose a set of practical mitigation strategies to help future work.

翻译：因此,所收集的社团基本上不能代表全球人口,而全球人口往往是社会上最脆弱和处于社会边缘地位的一些人口,而且往往生活在发展中农村地区,因此,这些代表人数不足的群体不仅在作出建模和系统设计决定时被忽视,而且无法从通过以数据为驱动的NLP取得的发展成果中受益。本文旨在解决在NLP公司中文盲社区代表人数不足的问题:我们查明在收集低收入国家文盲率高的农村社区的数据时可能出现的潜在偏见和道德问题,并提出一套实际的缓解战略,以帮助今后的工作。

0

相关内容

可辨认的

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

多模态学习方法综述

专知会员服务

234+阅读 · 2020年5月6日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【中科院】命名实体识别技术综述

专知会员服务

157+阅读 · 2020年4月21日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

已删除

将门创投

7+阅读 · 2018年8月28日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

A Comprehensive Survey on Knowledge Graph Entity Alignment via Representation Learning

Arxiv

0+阅读 · 2021年3月28日

From Seeing to Moving: A Survey on Learning for Visual Indoor Navigation (VIN)

Arxiv

0+阅读 · 2021年3月26日

Generalizing to Unseen Domains: A Survey on Domain Generalization

Arxiv

30+阅读 · 2021年3月10日

Explainable Recommender Systems via Resolving Learning Representations

Arxiv

13+阅读 · 2020年8月21日

Investigating and Mitigating Degree-Related Biases in Graph Convolutional Networks

Arxiv

6+阅读 · 2020年8月13日

Challenges in Building Intelligent Open-domain Dialog Systems

Arxiv

8+阅读 · 2019年10月22日

Deep learning for time series classification: a review

Arxiv

12+阅读 · 2019年3月14日

Knowledge Representation Learning: A Quantitative Review

Knowledge Representation Learning: A Quantitative Review

Arxiv

28+阅读 · 2018年12月28日

Topic Modelling of Empirical Text Corpora: Validity, Reliability, and Reproducibility in Comparison to Semantic Maps

Arxiv

4+阅读 · 2018年6月4日

Current Challenges and Visions in Music Recommender Systems Research

Arxiv

7+阅读 · 2018年3月21日

VIP会员

文章信息

相关主题

相关VIP内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

多模态学习方法综述

专知会员服务

234+阅读 · 2020年5月6日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【中科院】命名实体识别技术综述

专知会员服务

157+阅读 · 2020年4月21日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能模型风险目录：开发者与研究者对现实世界AI危害的认知盲区》

《印美国防合作：“自力更生”计划》最新126页报告

构建新大脑：将军事院校转型为AI作战实验室

《革命性软件智能：融合神经程序合成、量子安全运维与可解释人工智能的下一代自主系统统一框架》最新报告

相关资讯

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

已删除

将门创投

7+阅读 · 2018年8月28日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机类 | 期刊专刊截稿信息9条

计算机类 | 期刊专刊截稿信息9条

Call4Papers

4+阅读 · 2018年1月26日

相关论文

A Comprehensive Survey on Knowledge Graph Entity Alignment via Representation Learning

Arxiv

0+阅读 · 2021年3月28日

From Seeing to Moving: A Survey on Learning for Visual Indoor Navigation (VIN)

Arxiv

0+阅读 · 2021年3月26日

Generalizing to Unseen Domains: A Survey on Domain Generalization

Arxiv

30+阅读 · 2021年3月10日

Explainable Recommender Systems via Resolving Learning Representations

Arxiv

13+阅读 · 2020年8月21日

Investigating and Mitigating Degree-Related Biases in Graph Convolutional Networks

Arxiv

6+阅读 · 2020年8月13日

Challenges in Building Intelligent Open-domain Dialog Systems

Arxiv

8+阅读 · 2019年10月22日

Deep learning for time series classification: a review

Arxiv

12+阅读 · 2019年3月14日

Knowledge Representation Learning: A Quantitative Review

Knowledge Representation Learning: A Quantitative Review

Arxiv

28+阅读 · 2018年12月28日

Topic Modelling of Empirical Text Corpora: Validity, Reliability, and Reproducibility in Comparison to Semantic Maps

Arxiv

4+阅读 · 2018年6月4日

Current Challenges and Visions in Music Recommender Systems Research

Arxiv

7+阅读 · 2018年3月21日

微信扫码咨询专知VIP会员