测量利用人口普查数据的语言模型中的规范性和描述性偏差 (Measuring Normative and Descriptive Biases in Language Models Using Census Data) - 专知论文

会员服务 ·

0

语言模型 · 偏差 · 统计信息 · 系统 · 自然语言处理 ·

2023 年 4 月 12 日

Measuring Normative and Descriptive Biases in Language Models Using Census Data

翻译：测量利用人口普查数据的语言模型中的规范性和描述性偏差

Samia Touileb,Lilja Øvrelid,Erik Velldal

from arxiv, Accepted at EACL2023 -- main conference

We investigate in this paper how distributions of occupations with respect to gender is reflected in pre-trained language models. Such distributions are not always aligned to normative ideals, nor do they necessarily reflect a descriptive assessment of reality. In this paper, we introduce an approach for measuring to what degree pre-trained language models are aligned to normative and descriptive occupational distributions. To this end, we use official demographic information about gender--occupation distributions provided by the national statistics agencies of France, Norway, United Kingdom, and the United States. We manually generate template-based sentences combining gendered pronouns and nouns with occupations, and subsequently probe a selection of ten language models covering the English, French, and Norwegian languages. The scoring system we introduce in this work is language independent, and can be used on any combination of template-based sentences, occupations, and languages. The approach could also be extended to other dimensions of national census data and other demographic variables.

翻译：我们在本文中研究了职业领域中性别分布在预先训练的语言模型中的反映。这些分布并不总是符合规范，也不一定反映现实的描述性评估。在本文中，我们介绍了一种衡量预先训练的语言模型在规范和描述性职业分布方面程度的方法。为此，我们使用了法国、挪威、英国和美国国家统计机构提供的关于性别-职业分布的官方人口统计信息。我们手动生成了结合了带性别的代词和名词以及职业的基于模板的句子，然后检查了涵盖英语、法语和挪威语的十种语言模型的选择。我们在本文中引入的评分系统是与语言无关的，可以用于任何基于模板的句子、职业和语言的组合。这种方法也可以扩展到国家人口普查数据的其他维度和其他人口统计变量。

0

相关内容

语言模型

【2023新书】使用Python进行统计和数据可视化，554页pdf

【2023新书】使用Python进行统计和数据可视化，554页pdf

专知会员服务

130+阅读 · 2023年1月29日

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日

知识增强预训练语言模型:全面综述

知识增强预训练语言模型:全面综述

专知会员服务

93+阅读 · 2021年10月19日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

专知会员服务

30+阅读 · 2020年3月28日

【斯坦福大学AI】BERT, ELMo， & GPT-2:上下文化的单词表示是怎样的?

专知会员服务

35+阅读 · 2020年3月28日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

时间分辨的里德堡态光电子影像对分子构象动力学的研究

国家自然科学基金

0+阅读 · 2013年12月31日

条件模型的计量经济学方法探讨及应用

国家自然科学基金

1+阅读 · 2013年12月31日

金融风险中的定价及其准则探索和大偏差

国家自然科学基金

1+阅读 · 2013年12月31日

函数空间与度量测度空间上的分析

国家自然科学基金

0+阅读 · 2012年12月31日

EAST上离子回旋模式转换驱动等离子体转动的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于磁层卫星和地面观测与太阳日冕遥测的磁场重联研究

国家自然科学基金

0+阅读 · 2011年12月31日

强激光与多电子原子的相互作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

阿秒分辨量子动力学

国家自然科学基金

0+阅读 · 2009年12月31日

基于经济周期的会计及财务行为研究

国家自然科学基金

2+阅读 · 2009年12月31日

电子回旋共振放电电离特性的PIC/MCC模拟

国家自然科学基金

0+阅读 · 2009年12月31日

Decomposition of Explained Variation in the Linear Mixed Model

Arxiv

0+阅读 · 2023年5月30日

InDL: A New Datasets and Benchmark for In-Diagram Logic Interpreting based on Visual Illusion

Arxiv

0+阅读 · 2023年5月30日

Investigating model performance in language identification: beyond simple error statistics

Arxiv

0+阅读 · 2023年5月30日

Decision Support to Crowdsourcing for Annotation and Transcription of Ancient Documents: The RECITAL Workshop

Arxiv

0+阅读 · 2023年5月30日

Neural Network-based CUSUM for Online Change-point Detection

Arxiv

0+阅读 · 2023年5月30日

Beyond One-Model-Fits-All: A Survey of Domain Specialization for Large Language Models

Arxiv

0+阅读 · 2023年5月30日

Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models

Arxiv

0+阅读 · 2023年5月29日

Baselines for Identifying Watermarked Large Language Models

Arxiv

0+阅读 · 2023年5月29日

DataFinder: Scientific Dataset Recommendation from Natural Language Descriptions

Arxiv

0+阅读 · 2023年5月26日

Natural Language Descriptions of Deep Visual Features

Arxiv

12+阅读 · 2022年1月26日

VIP会员

文章信息

相关主题

自然语言处理

相关VIP内容

【2023新书】使用Python进行统计和数据可视化，554页pdf

【2023新书】使用Python进行统计和数据可视化，554页pdf

专知会员服务

130+阅读 · 2023年1月29日

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日

知识增强预训练语言模型:全面综述

知识增强预训练语言模型:全面综述

专知会员服务

93+阅读 · 2021年10月19日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

【Google-Mila】你的GAN实际上是一个基于能量的模型，你应该使用鉴别器驱动的潜在采样，Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling

专知会员服务

30+阅读 · 2020年3月28日

【斯坦福大学AI】BERT, ELMo， & GPT-2:上下文化的单词表示是怎样的?

专知会员服务

35+阅读 · 2020年3月28日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS2025】迈向鲁棒的零样本强化学习

一种基于视觉算法生成三维场景重建的多任务系统 | 2025最新200页

【普林斯顿博士论文】量化、评估与缓解现代机器学习系统中的风险

遥感中基于深度学习的领域自适应方法：全面综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

使用BERT做文本摘要

使用BERT做文本摘要

专知

23+阅读 · 2019年12月7日

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

RoBERTa for Chinese：大规模中文预训练RoBERTa模型

AINLP

30+阅读 · 2019年9月8日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Decomposition of Explained Variation in the Linear Mixed Model

Arxiv

0+阅读 · 2023年5月30日

InDL: A New Datasets and Benchmark for In-Diagram Logic Interpreting based on Visual Illusion

Arxiv

0+阅读 · 2023年5月30日

Investigating model performance in language identification: beyond simple error statistics

Arxiv

0+阅读 · 2023年5月30日

Decision Support to Crowdsourcing for Annotation and Transcription of Ancient Documents: The RECITAL Workshop

Arxiv

0+阅读 · 2023年5月30日

Neural Network-based CUSUM for Online Change-point Detection

Arxiv

0+阅读 · 2023年5月30日

Beyond One-Model-Fits-All: A Survey of Domain Specialization for Large Language Models

Arxiv

0+阅读 · 2023年5月30日

Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models

Arxiv

0+阅读 · 2023年5月29日

Baselines for Identifying Watermarked Large Language Models

Arxiv

0+阅读 · 2023年5月29日

DataFinder: Scientific Dataset Recommendation from Natural Language Descriptions

Arxiv

0+阅读 · 2023年5月26日

Natural Language Descriptions of Deep Visual Features

Arxiv

12+阅读 · 2022年1月26日

相关基金

时间分辨的里德堡态光电子影像对分子构象动力学的研究

国家自然科学基金

0+阅读 · 2013年12月31日

条件模型的计量经济学方法探讨及应用

国家自然科学基金

1+阅读 · 2013年12月31日

金融风险中的定价及其准则探索和大偏差

国家自然科学基金

1+阅读 · 2013年12月31日

函数空间与度量测度空间上的分析

国家自然科学基金

0+阅读 · 2012年12月31日

EAST上离子回旋模式转换驱动等离子体转动的实验研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于磁层卫星和地面观测与太阳日冕遥测的磁场重联研究

国家自然科学基金

0+阅读 · 2011年12月31日

强激光与多电子原子的相互作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

阿秒分辨量子动力学

国家自然科学基金

0+阅读 · 2009年12月31日

基于经济周期的会计及财务行为研究

国家自然科学基金

2+阅读 · 2009年12月31日

电子回旋共振放电电离特性的PIC/MCC模拟

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员