恢复中的性别语言及其对雇用中的算法比值的影响 (Gendered Language in Resumes and its Implications for Algorithmic Bias in Hiring) - 专知论文

会员服务 ·

0

有偏 · INFORMS · MoDELS · TF-IDF · SimPLe ·

2021 年 12 月 16 日

Gendered Language in Resumes and its Implications for Algorithmic Bias in Hiring

翻译：恢复中的性别语言及其对雇用中的算法比值的影响

Prasanna Parasurama,João Sedoc

from arxiv, None

Despite growing concerns around gender bias in NLP models used in algorithmic hiring, there is little empirical work studying the extent and nature of gendered language in resumes. Using a corpus of 709k resumes from IT firms, we train a series of models to classify the gender of the applicant, thereby measuring the extent of gendered information encoded in resumes. We also investigate whether it is possible to obfuscate gender from resumes by removing gender identifiers, hobbies, gender sub-space in embedding models, etc. We find that there is a significant amount of gendered information in resumes even after obfuscation. A simple Tf-Idf model can learn to classify gender with AUROC=0.75, and more sophisticated transformer-based models achieve AUROC=0.8. We further find that gender predictive values have low correlation with gender direction of embeddings -- meaning that, what is predictive of gender is much more than what is "gendered" in the masculine/feminine sense. We discuss the algorithmic bias and fairness implications of these findings in the hiring context.

翻译：尽管人们对在算法雇用中使用的NLP模式中的性别偏见日益感到关切,但很少有经验工作来研究复发中的性别语言的范围和性质。我们利用信息技术公司的709k简历,培训了一系列模型,对申请人的性别进行分类,从而测量复发中编码的性别信息的范围。我们还调查是否有可能通过消除性别识别特征、爱好、嵌入模型中的性别分空间等来将性别从复发中分解出来。我们发现,即使在复发后,在复发中仍然有大量性别信息。一个简单的Tf-Idf模型可以学习用AUROC=0.75对性别进行分类,而更先进的变异器模型可以实现AUROC=0.8。我们进一步发现,性别预测值与嵌入的性别方向关系不大 -- 也就是说,对性别的预测远远大于男性/女性意义上的“性别”定义。我们讨论了这些结论的算法偏见和公平影响。

0

相关内容

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

专知会员服务

41+阅读 · 2020年8月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

BERT到底如何work的？A Primer in BERTology: What we know about how BERT works

BERT到底如何work的？A Primer in BERTology: What we know about how BERT works

专知会员服务

50+阅读 · 2020年2月28日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【AAAI2020接受论文】隐式关系语言模型，CMU&微软，Latent Relation Language Models

【AAAI2020接受论文】隐式关系语言模型，CMU&微软，Latent Relation Language Models

专知会员服务

54+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

年度必读：2018最具突破性人工智能论文Top 10

年度必读：2018最具突破性人工智能论文Top 10

机器学习算法与Python学习

11+阅读 · 2018年12月2日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

Embarrassingly Simple Performance Prediction for Abductive Natural Language Inference

Arxiv

0+阅读 · 2022年2月21日

On Resolving Problems with Conditionality and Its Implications for Characterizing Statistical Evidence

Arxiv

0+阅读 · 2022年2月20日

From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French

Arxiv

0+阅读 · 2022年2月18日

Human-Algorithm Collaboration: Achieving Complementarity and Avoiding Unfairness

Human-Algorithm Collaboration: Achieving Complementarity and Avoiding Unfairness

Arxiv

0+阅读 · 2022年2月17日

Probing Linguistic Information For Logical Inference In Pre-trained Language Models

Arxiv

5+阅读 · 2021年12月3日

On the Opportunities and Risks of Foundation Models

Arxiv

30+阅读 · 2021年8月18日

On Disentangled Representations Learned From Correlated Data

Arxiv

8+阅读 · 2021年7月16日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

The Importance of Modeling Data Missingness in Algorithmic Fairness: A Causal Perspective

Arxiv

5+阅读 · 2020年12月21日

Inducing Relational Knowledge from BERT

Arxiv

3+阅读 · 2019年11月28日

VIP会员

文章信息

相关主题

相关VIP内容

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

专知会员服务

41+阅读 · 2020年8月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

BERT到底如何work的？A Primer in BERTology: What we know about how BERT works

BERT到底如何work的？A Primer in BERTology: What we know about how BERT works

专知会员服务

50+阅读 · 2020年2月28日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【AAAI2020接受论文】隐式关系语言模型，CMU&微软，Latent Relation Language Models

【AAAI2020接受论文】隐式关系语言模型，CMU&微软，Latent Relation Language Models

专知会员服务

54+阅读 · 2019年11月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

年度必读：2018最具突破性人工智能论文Top 10

年度必读：2018最具突破性人工智能论文Top 10

机器学习算法与Python学习

11+阅读 · 2018年12月2日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

相关论文

Embarrassingly Simple Performance Prediction for Abductive Natural Language Inference

Arxiv

0+阅读 · 2022年2月21日

On Resolving Problems with Conditionality and Its Implications for Characterizing Statistical Evidence

Arxiv

0+阅读 · 2022年2月20日

From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French

Arxiv

0+阅读 · 2022年2月18日

Human-Algorithm Collaboration: Achieving Complementarity and Avoiding Unfairness

Human-Algorithm Collaboration: Achieving Complementarity and Avoiding Unfairness

Arxiv

0+阅读 · 2022年2月17日

Probing Linguistic Information For Logical Inference In Pre-trained Language Models

Arxiv

5+阅读 · 2021年12月3日

On the Opportunities and Risks of Foundation Models

Arxiv

30+阅读 · 2021年8月18日

On Disentangled Representations Learned From Correlated Data

Arxiv

8+阅读 · 2021年7月16日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

The Importance of Modeling Data Missingness in Algorithmic Fairness: A Causal Perspective

Arxiv

5+阅读 · 2020年12月21日

Inducing Relational Knowledge from BERT

Arxiv

3+阅读 · 2019年11月28日

微信扫码咨询专知VIP会员