通向词汇性别推论:使用在线数据库的可扩展方法 (Towards Lexical Gender Inference: A Scalable Methodology using Online Databases) - 专知论文

会员服务 ·

0

Analysis · 推断 · 编译器 · INFORMS · 讲稿 ·

2022 年 6 月 28 日

Towards Lexical Gender Inference: A Scalable Methodology using Online Databases

翻译：通向词汇性别推论:使用在线数据库的可扩展方法

Marion Bartl,Susan Leavy

from arxiv, 12 pages, 4 tables, 2 figures. Article published under different title in Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion at ACL 2022

This paper presents a new method for automatically detecting words with lexical gender in large-scale language datasets. Currently, the evaluation of gender bias in natural language processing relies on manually compiled lexicons of gendered expressions, such as pronouns ('he', 'she', etc.) and nouns with lexical gender ('mother', 'boyfriend', 'policewoman', etc.). However, manual compilation of such lists can lead to static information if they are not periodically updated and often involve value judgments by individual annotators and researchers. Moreover, terms not included in the list fall out of the range of analysis. To address these issues, we devised a scalable, dictionary-based method to automatically detect lexical gender that can provide a dynamic, up-to-date analysis with high coverage. Our approach reaches over 80% accuracy in determining the lexical gender of nouns retrieved randomly from a Wikipedia sample and when testing on a list of gendered words used in previous research.

翻译：本文介绍了在大规模语言数据集中自动发现带有词汇性别的词组的新方法。目前,对自然语言处理中的性别偏见的评价依赖于人工汇编的性别表达法,如名词('he'、'she'等)和名词('母亲'、'男友'、'女警察'等),但是,如果这些名单不定期更新,而且经常涉及个别告发者和研究人员的价值判断,手工汇编这些名单可能导致静态信息。此外,清单中未列入的术语不属于分析范围。为了解决这些问题,我们设计了一个可缩放的字典法方法,以自动检测具有高度覆盖面的词汇性别。我们的方法在确定从维基百科样本中随机检索的名词的词汇性别以及测试先前研究中使用的性别词汇清单时达到80%的准确度。

0

相关内容

Analysis

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

乙型肝炎病毒x蛋白激活Rho信号途径诱发肝细胞恶性转化分子机制的定量蛋白质组学研究

国家自然科学基金

0+阅读 · 2014年12月31日

肝星状细胞NLRP3/caspase-1信号通路持续活化在慢性和传播阻断后血吸虫病致病中的作用机制

国家自然科学基金

0+阅读 · 2014年12月31日

Yb3+、Ca2+离子共掺新型硼硅酸盐超快激光晶体的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Septin7活化Ca2+/CaN/NFAT2信号途径在糖尿病肾病足细胞损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Reticulon-1介导的内质网应激在糖尿病肾病发病机制中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

TREM-1/DAP12/ NF-κB信号通路在6-姜烯酚抗动脉粥样硬化中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

生理和缺血再灌注状态下的冠脉内皮功能 - - 内皮离子通道间信号关联的研究

国家自然科学基金

0+阅读 · 2012年12月31日

MDSCs在动脉粥样硬化中的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

On the Privacy Effect of Data Enhancement via the Lens of Memorization

Arxiv

0+阅读 · 2022年8月17日

Semi-supervised Learning with Deterministic Labeling and Large Margin Projection

Arxiv

0+阅读 · 2022年8月17日

Design of Supervision-Scalable Learning Systems: Methodology and Performance Benchmarking

Arxiv

0+阅读 · 2022年8月17日

Resource-aware Federated Learning using Knowledge Extraction and Multi-model Fusion

Arxiv

0+阅读 · 2022年8月16日

On the generalization of learning algorithms that do not converge

Arxiv

0+阅读 · 2022年8月16日

Towards Local Underexposed Photo Enhancement

Arxiv

0+阅读 · 2022年8月16日

Reweighting the RCT for generalization: finite sample analysis and variable selection

Arxiv

0+阅读 · 2022年8月16日

On Efficient and Scalable Computation of the Nonparametric Maximum Likelihood Estimator in Mixture Models

Arxiv

0+阅读 · 2022年8月16日

Model Optimization in Imbalanced Regression

Arxiv

0+阅读 · 2022年8月16日

A Survey on Causal Inference

Arxiv

112+阅读 · 2020年2月5日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《利用大语言模型（LLM）优化海军陆战队经验教训学习》2025年最新103页

《加拿大陆军顶层作战概念》2025最新33页

超越第一人称视角（FPV）无人机：汲取俄乌战争的全部教训

《瓦洛伦斯（ValoRens）项目 - 预测分析：解读敌方意图》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

相关论文

On the Privacy Effect of Data Enhancement via the Lens of Memorization

Arxiv

0+阅读 · 2022年8月17日

Semi-supervised Learning with Deterministic Labeling and Large Margin Projection

Arxiv

0+阅读 · 2022年8月17日

Design of Supervision-Scalable Learning Systems: Methodology and Performance Benchmarking

Arxiv

0+阅读 · 2022年8月17日

Resource-aware Federated Learning using Knowledge Extraction and Multi-model Fusion

Arxiv

0+阅读 · 2022年8月16日

On the generalization of learning algorithms that do not converge

Arxiv

0+阅读 · 2022年8月16日

Towards Local Underexposed Photo Enhancement

Arxiv

0+阅读 · 2022年8月16日

Reweighting the RCT for generalization: finite sample analysis and variable selection

Arxiv

0+阅读 · 2022年8月16日

On Efficient and Scalable Computation of the Nonparametric Maximum Likelihood Estimator in Mixture Models

Arxiv

0+阅读 · 2022年8月16日

Model Optimization in Imbalanced Regression

Arxiv

0+阅读 · 2022年8月16日

A Survey on Causal Inference

Arxiv

112+阅读 · 2020年2月5日

相关基金

乙型肝炎病毒x蛋白激活Rho信号途径诱发肝细胞恶性转化分子机制的定量蛋白质组学研究

国家自然科学基金

0+阅读 · 2014年12月31日

肝星状细胞NLRP3/caspase-1信号通路持续活化在慢性和传播阻断后血吸虫病致病中的作用机制

国家自然科学基金

0+阅读 · 2014年12月31日

Yb3+、Ca2+离子共掺新型硼硅酸盐超快激光晶体的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Septin7活化Ca2+/CaN/NFAT2信号途径在糖尿病肾病足细胞损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Reticulon-1介导的内质网应激在糖尿病肾病发病机制中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

microRNA调节肿瘤抑制因子Caliban应答DNA损伤的机制

国家自然科学基金

1+阅读 · 2012年12月31日

TREM-1/DAP12/ NF-κB信号通路在6-姜烯酚抗动脉粥样硬化中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

生理和缺血再灌注状态下的冠脉内皮功能 - - 内皮离子通道间信号关联的研究

国家自然科学基金

0+阅读 · 2012年12月31日

MDSCs在动脉粥样硬化中的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员