评估单词嵌入中的偏见的衡量尺度 (Evaluating Metrics for Bias in Word Embeddings) - 专知论文

会员服务 ·

0

有偏 · 词向量表示 · Performer · CASES · 余弦 ·

2021 年 11 月 15 日

Evaluating Metrics for Bias in Word Embeddings

翻译：评估单词嵌入中的偏见的衡量尺度

Sarah Schröder,Alexander Schulz,Philip Kenneweg,Robert Feldhans,Fabian Hinder,Barbara Hammer

Over the last years, word and sentence embeddings have established as text preprocessing for all kinds of NLP tasks and improved the performances significantly. Unfortunately, it has also been shown that these embeddings inherit various kinds of biases from the training data and thereby pass on biases present in society to NLP solutions. Many papers attempted to quantify bias in word or sentence embeddings to evaluate debiasing methods or compare different embedding models, usually with cosine-based metrics. However, lately some works have raised doubts about these metrics showing that even though such metrics report low biases, other tests still show biases. In fact, there is a great variety of bias metrics or tests proposed in the literature without any consensus on the optimal solutions. Yet we lack works that evaluate bias metrics on a theoretical level or elaborate the advantages and disadvantages of different bias metrics. In this work, we will explore different cosine based bias metrics. We formalize a bias definition based on the ideas from previous works and derive conditions for bias metrics. Furthermore, we thoroughly investigate the existing cosine-based metrics and their limitations to show why these metrics can fail to report biases in some cases. Finally, we propose a new metric, SAME, to address the shortcomings of existing metrics and mathematically prove that SAME behaves appropriately.

翻译：过去几年来,单词和句内嵌嵌式被确定为各种NLP任务的文本预处理,并显著改进了绩效。不幸的是,还显示这些嵌入中继承了培训数据中的各种偏见,从而将社会上存在的偏见传给NLP解决方案。许多文件试图量化文字或句内嵌式中的偏见,以评价贬低性偏向方法,或比较不同的嵌入模式,通常使用基于cosine的衡量标准。然而,最近有些工作使人们对这些衡量标准产生怀疑,显示尽管这类指标报告低偏向,但其他测试仍然显示偏差。事实上,文献中提议的偏见衡量标准或测试有很多种,没有就最佳解决方案达成共识。然而,我们缺乏从理论上评价偏差衡量标准或阐述不同偏差衡量标准的利弊的工作。在这项工作中,我们将探讨基于偏差衡量标准的不同参数。我们根据以往工作的想法正式确定了偏差定义,并提出了偏差度衡量标准的条件。此外,我们彻底调查现有的基于正基指标的计量标准以及它们的限制,以表明文献中提出了各种偏差之处,我们最终能够证明这些衡量标准中的偏差。

0

相关内容

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

专知会员服务

14+阅读 · 2019年11月15日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

论文笔记 | How NOT To Evaluate Your Dialogue System

论文笔记 | How NOT To Evaluate Your Dialogue System

科技创新与创业

13+阅读 · 2017年12月23日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Evaluation of HTR models without Ground Truth Material

Arxiv

0+阅读 · 2022年1月17日

Inspecting state of the art performance and NLP metrics in image-based medical report generation

Arxiv

0+阅读 · 2022年1月15日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

A Realistic Evaluation of Semi-Supervised Learning for Fine-Grained Classification

Arxiv

6+阅读 · 2021年4月1日

A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs

Arxiv

3+阅读 · 2020年7月20日

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Arxiv

11+阅读 · 2019年11月4日

Analysis Methods in Neural Language Processing: A Survey

Analysis Methods in Neural Language Processing: A Survey

Arxiv

4+阅读 · 2019年1月14日

Dissecting Contextual Word Embeddings: Architecture and Representation

Dissecting Contextual Word Embeddings: Architecture and Representation

Arxiv

22+阅读 · 2018年8月27日

A Comparison of Word Embeddings for the Biomedical Natural Language Processing

Arxiv

3+阅读 · 2018年2月1日

From Node Embedding To Community Embedding

Arxiv

3+阅读 · 2017年9月14日

VIP会员

文章信息

相关主题

词向量表示

相关VIP内容

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

【AAAI2020接受论文】预测性参与:开放领域对话系统自动评估的有效指标（Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems）

专知会员服务

14+阅读 · 2019年11月15日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

神经网络学习率设置

神经网络学习率设置

机器学习研究会

4+阅读 · 2018年3月3日

论文笔记 | How NOT To Evaluate Your Dialogue System

论文笔记 | How NOT To Evaluate Your Dialogue System

科技创新与创业

13+阅读 · 2017年12月23日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Evaluation of HTR models without Ground Truth Material

Arxiv

0+阅读 · 2022年1月17日

Inspecting state of the art performance and NLP metrics in image-based medical report generation

Arxiv

0+阅读 · 2022年1月15日

Towards Out-Of-Distribution Generalization: A Survey

Arxiv

38+阅读 · 2021年8月31日

A Realistic Evaluation of Semi-Supervised Learning for Fine-Grained Classification

Arxiv

6+阅读 · 2021年4月1日

A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs

Arxiv

3+阅读 · 2020年7月20日

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Arxiv

11+阅读 · 2019年11月4日

Analysis Methods in Neural Language Processing: A Survey

Analysis Methods in Neural Language Processing: A Survey

Arxiv

4+阅读 · 2019年1月14日

Dissecting Contextual Word Embeddings: Architecture and Representation

Dissecting Contextual Word Embeddings: Architecture and Representation

Arxiv

22+阅读 · 2018年8月27日

A Comparison of Word Embeddings for the Biomedical Natural Language Processing

Arxiv

3+阅读 · 2018年2月1日

From Node Embedding To Community Embedding

Arxiv

3+阅读 · 2017年9月14日

微信扫码咨询专知VIP会员