多语种蒙面语言模式中的性别偏见 (Gender Bias in Masked Language Models for Multiple Languages) - 专知论文

会员服务 ·

0

有偏 · 掩码语言模型化 · 语言模型化 · 掩码 · MoDELS ·

2022 年 5 月 4 日

Gender Bias in Masked Language Models for Multiple Languages

翻译：多语种蒙面语言模式中的性别偏见

Masahiro Kaneko,Aizhan Imankulova,Danushka Bollegala,Naoaki Okazaki

from arxiv, NAACL 2022

Masked Language Models (MLMs) pre-trained by predicting masked tokens on large corpora have been used successfully in natural language processing tasks for a variety of languages. Unfortunately, it was reported that MLMs also learn discriminative biases regarding attributes such as gender and race. Because most studies have focused on MLMs in English, the bias of MLMs in other languages has rarely been investigated. Manual annotation of evaluation data for languages other than English has been challenging due to the cost and difficulty in recruiting annotators. Moreover, the existing bias evaluation methods require the stereotypical sentence pairs consisting of the same context with attribute words (e.g. He/She is a nurse). We propose Multilingual Bias Evaluation (MBE) score, to evaluate bias in various languages using only English attribute word lists and parallel corpora between the target language and English without requiring manually annotated data. We evaluated MLMs in eight languages using the MBE and confirmed that gender-related biases are encoded in MLMs for all those languages. We manually created datasets for gender bias in Japanese and Russian to evaluate the validity of the MBE. The results show that the bias scores reported by the MBE significantly correlates with that computed from the above manually created datasets and the existing English datasets for gender bias.

翻译：语言模型(MLMS)是预先通过预测大型社团的蒙面标记来预先培训的,在对各种语言进行自然语言处理任务时成功地使用了大型社团的蒙面标记。不幸的是,据报告,MLMS还学会了对性别和种族等属性的歧视性偏见。由于大多数研究都侧重于英语的MLMS,因此很少调查其他语言的MLMS偏向性。英语以外语言的评估数据手工注释具有挑战性,因为招聘通知员的成本和困难。此外,现有的偏见评价方法要求用带有属性词的同一背景(如He/She是护士)对定型句子。我们建议多语言比亚斯评价(MBE)得分,以评价各种语言的偏见,只使用英语属性词表和目标语言与英语的平行体系,而不需要人工附加说明数据。我们用8种语言用MBE语言对MLMS进行了评价,并证实所有这些语言的性别偏见都已编码为MLMMS。我们手工制作了日语和俄语的性别偏见数据集,以评价来自MBE的现有性别比例数据。我们报告的、以及根据MBE报告的现有性别比例数据报告的数据的正确性。

0

相关内容

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

纳米晶多铁性材料中子衍射研究

国家自然科学基金

0+阅读 · 2014年12月31日

稀土RE-Mn-Fe体系相图及稀土对MnFe合金磁致伸缩性能的影响

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA-VEC1340靶定KLF4在血管内皮细胞损伤中的调控及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

以ED-A(+)Fn为靶点超声纳米分子成像及靶向治疗心脏移植慢性排斥反应

国家自然科学基金

0+阅读 · 2014年12月31日

GOAT/Ghrelin系统在断奶仔猪胃酸分泌中的作用及分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

海洋天然产物Lamellarin D糖基化衍生物的合成与构效关系研究

国家自然科学基金

0+阅读 · 2013年12月31日

雌激素通过ERα介导lncRNA 1200076调节卵巢ERα（+）细胞生物学行为

国家自然科学基金

0+阅读 · 2012年12月31日

长链非编码RNA lncR-RP11作为ceRNA调控Kras表达在胰腺癌中作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

调节骨内P-gp活性对激素性股骨头坏死的影响及其分子机理

国家自然科学基金

0+阅读 · 2011年12月31日

Large Scale Clustering with Variational EM for Gaussian Mixture Models

Arxiv

0+阅读 · 2022年6月21日

OPT: Open Pre-trained Transformer Language Models

Arxiv

0+阅读 · 2022年6月21日

General Framework for Reversible Data Hiding in Texts Based on Masked Language Modeling

Arxiv

0+阅读 · 2022年6月21日

Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks

Arxiv

0+阅读 · 2022年6月20日

Bridge-Tower: Building Bridges Between Encoders in Vision-Language Representation Learning

Arxiv

0+阅读 · 2022年6月17日

MSDF: A General Open-Domain Multi-Skill Dialog Framework

Arxiv

0+阅读 · 2022年6月17日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Recent Advances in Deep Learning-based Dialogue Systems

Arxiv

18+阅读 · 2021年5月10日

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Arxiv

13+阅读 · 2021年4月7日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

VIP会员

文章信息

相关主题

掩码语言模型化

语言模型化

相关VIP内容

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

《多体环境下定位导航授时（PNT）系统研究》228页

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Large Scale Clustering with Variational EM for Gaussian Mixture Models

Arxiv

0+阅读 · 2022年6月21日

OPT: Open Pre-trained Transformer Language Models

Arxiv

0+阅读 · 2022年6月21日

General Framework for Reversible Data Hiding in Texts Based on Masked Language Modeling

Arxiv

0+阅读 · 2022年6月21日

Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks

Arxiv

0+阅读 · 2022年6月20日

Bridge-Tower: Building Bridges Between Encoders in Vision-Language Representation Learning

Arxiv

0+阅读 · 2022年6月17日

MSDF: A General Open-Domain Multi-Skill Dialog Framework

Arxiv

0+阅读 · 2022年6月17日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Recent Advances in Deep Learning-based Dialogue Systems

Arxiv

18+阅读 · 2021年5月10日

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Arxiv

13+阅读 · 2021年4月7日

Latent Relation Language Models

Arxiv

21+阅读 · 2019年8月21日

相关基金

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

纳米晶多铁性材料中子衍射研究

国家自然科学基金

0+阅读 · 2014年12月31日

稀土RE-Mn-Fe体系相图及稀土对MnFe合金磁致伸缩性能的影响

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA-VEC1340靶定KLF4在血管内皮细胞损伤中的调控及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

以ED-A(+)Fn为靶点超声纳米分子成像及靶向治疗心脏移植慢性排斥反应

国家自然科学基金

0+阅读 · 2014年12月31日

GOAT/Ghrelin系统在断奶仔猪胃酸分泌中的作用及分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

海洋天然产物Lamellarin D糖基化衍生物的合成与构效关系研究

国家自然科学基金

0+阅读 · 2013年12月31日

雌激素通过ERα介导lncRNA 1200076调节卵巢ERα（+）细胞生物学行为

国家自然科学基金

0+阅读 · 2012年12月31日

长链非编码RNA lncR-RP11作为ceRNA调控Kras表达在胰腺癌中作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

调节骨内P-gp活性对激素性股骨头坏死的影响及其分子机理

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员