光滑第一顺序共生的文本公司 (Measuring Societal Biases from Text Corpora with Smoothed First-Order Co-occurrence) - 专知论文

会员服务 ·

0

有偏 · 向量化 · 平滑 · 词向量表示 · 估计/估计量 ·

2021 年 3 月 16 日

Measuring Societal Biases from Text Corpora with Smoothed First-Order Co-occurrence

翻译：光滑第一顺序共生的文本公司

Navid Rekabsaz,Robert West,James Henderson,Allan Hanbury

from arxiv, To be appeared in ICWSM 2021

Text corpora are widely used resources for measuring societal biases and stereotypes. The common approach to measuring such biases using a corpus is by calculating the similarities between the embedding vector of a word (like nurse) and the vectors of the representative words of the concepts of interest (such as genders). In this study, we show that, depending on what one aims to quantify as bias, this commonly-used approach can introduce non-relevant concepts into bias measurement. We propose an alternative approach to bias measurement utilizing the smoothed first-order co-occurrence relations between the word and the representative concept words, which we derive by reconstructing the co-occurrence estimates inherent in word embedding models. We compare these approaches by conducting several experiments on the scenario of measuring gender bias of occupational words, according to an English Wikipedia corpus. Our experiments show higher correlations of the measured gender bias with the actual gender bias statistics of the U.S. job market - on two collections and with a variety of word embedding models - using the first - order approach in comparison with the vector similarity-based approaches. The first-order approach also suggests a more severe bias towards female in a few specific occupations than the other approaches.

翻译：文本公司是用来衡量社会偏见和定型观念的广泛资源。用一个字体衡量这种偏见的共同方法是计算一个字(如护士)嵌入矢量与利益概念(如性别)代表字矢量之间的相似性。在本研究中,我们显示,根据什么目的将性别偏见量化为偏见,这种通常使用的方法可以将非相关概念引入偏见计量。我们建议了一种衡量偏见的替代方法,利用单词和代表概念字词之间平滑的第一阶共生关系,我们通过重新构建单词嵌入模型所固有的共同生出的估计值来得出。我们把这些方法进行比较,方法是根据一个英文维基百科对衡量职业词性别偏见的设想进行若干次实验。我们的实验表明,衡量性别偏见与美国职业市场两性偏见的实际统计数字在两种收藏上和各种词嵌入模式上的关系更加密切,使用第一阶方法与媒介类似方法相比较。第一阶方法还表明,在少数具体职业中,相对于其他方法而言,对女性的偏差更为严重。

0

相关内容

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

102+阅读 · 2020年4月25日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

LibRec 精选：连通知识图谱与推荐系统

LibRec 精选：连通知识图谱与推荐系统

LibRec智能推荐

3+阅读 · 2018年8月9日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Consensus-Aware Visual-Semantic Embedding for Image-Text Matching

Arxiv

4+阅读 · 2020年7月17日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Inferring Concept Hierarchies from Text Corpora via Hyperbolic Embeddings

Inferring Concept Hierarchies from Text Corpora via Hyperbolic Embeddings

Arxiv

8+阅读 · 2019年2月3日

Unsupervised Multilingual Word Embeddings

Arxiv

3+阅读 · 2018年8月27日

Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings

Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings

Arxiv

5+阅读 · 2018年7月16日

Global Relation Embedding for Relation Extraction

Arxiv

10+阅读 · 2018年4月19日

Learning a Robust Society of Tracking Parts using Co-occurrence Constraints

Arxiv

4+阅读 · 2018年4月5日

Human Interaction with Recommendation Systems

Arxiv

6+阅读 · 2018年3月28日

A Comparison of Word Embeddings for the Biomedical Natural Language Processing

Arxiv

3+阅读 · 2018年2月1日

What Level of Quality can Neural Machine Translation Attain on Literary Text?

Arxiv

5+阅读 · 2018年1月15日

VIP会员

文章信息

相关主题

词向量表示

估计/估计量

相关VIP内容

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

102+阅读 · 2020年4月25日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

LibRec 精选：连通知识图谱与推荐系统

LibRec 精选：连通知识图谱与推荐系统

LibRec智能推荐

3+阅读 · 2018年8月9日

笔记 | Sentiment Analysis

笔记 | Sentiment Analysis

黑龙江大学自然语言处理实验室

10+阅读 · 2018年5月6日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Consensus-Aware Visual-Semantic Embedding for Image-Text Matching

Arxiv

4+阅读 · 2020年7月17日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Inferring Concept Hierarchies from Text Corpora via Hyperbolic Embeddings

Inferring Concept Hierarchies from Text Corpora via Hyperbolic Embeddings

Arxiv

8+阅读 · 2019年2月3日

Unsupervised Multilingual Word Embeddings

Arxiv

3+阅读 · 2018年8月27日

Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings

Theme-weighted Ranking of Keywords from Text Documents using Phrase Embeddings

Arxiv

5+阅读 · 2018年7月16日

Global Relation Embedding for Relation Extraction

Arxiv

10+阅读 · 2018年4月19日

Learning a Robust Society of Tracking Parts using Co-occurrence Constraints

Arxiv

4+阅读 · 2018年4月5日

Human Interaction with Recommendation Systems

Arxiv

6+阅读 · 2018年3月28日

A Comparison of Word Embeddings for the Biomedical Natural Language Processing

Arxiv

3+阅读 · 2018年2月1日

What Level of Quality can Neural Machine Translation Attain on Literary Text?

Arxiv

5+阅读 · 2018年1月15日

微信扫码咨询专知VIP会员