检测社会媒体毒物模拟中跨地理生物量 (Detecting Cross-Geographic Biases in Toxicity Modeling on Social Media) - 专知论文

会员服务 ·

0

有偏 · MoDELS · 可辨认的 · GROUP · Performer ·

2021 年 9 月 29 日

Detecting Cross-Geographic Biases in Toxicity Modeling on Social Media

翻译：检测社会媒体毒物模拟中跨地理生物量

Sayan Ghosh,Dylan Baker,David Jurgens,Vinodkumar Prabhakaran

from arxiv, Proceedings of the 7th Workshop on Noisy User-generated Text (W-NUT)

Online social media platforms increasingly rely on Natural Language Processing (NLP) techniques to detect abusive content at scale in order to mitigate the harms it causes to their users. However, these techniques suffer from various sampling and association biases present in training data, often resulting in sub-par performance on content relevant to marginalized groups, potentially furthering disproportionate harms towards them. Studies on such biases so far have focused on only a handful of axes of disparities and subgroups that have annotations/lexicons available. Consequently, biases concerning non-Western contexts are largely ignored in the literature. In this paper, we introduce a weakly supervised method to robustly detect lexical biases in broader geocultural contexts. Through a case study on a publicly available toxicity detection model, we demonstrate that our method identifies salient groups of cross-geographic errors, and, in a follow up, demonstrate that these groupings reflect human judgments of offensive and inoffensive language in those geographic contexts. We also conduct analysis of a model trained on a dataset with ground truth labels to better understand these biases, and present preliminary mitigation experiments.

翻译：在线社交媒体平台日益依赖自然语言处理技术来大规模检测滥用内容,以减轻对用户的伤害;然而,这些技术在培训数据中受到各种抽样和关联偏见的影响,往往导致与边缘化群体相关内容的分级性表现,从而可能加剧对这些群体的过度伤害;迄今为止,关于这些偏见的研究只侧重于少数具有说明/灵活性的差别轴心和分组;因此,文献中基本上忽视了对非西方环境的偏见;在本文件中,我们采用了一种监督不力的方法,以便在更广泛的地理文化环境中强有力地探测出逻辑偏见;我们通过对公开可得的毒性检测模型进行案例研究,表明我们的方法确定了跨地理错误的突出群体,并在后续中表明,这些群体反映了这些地理环境中对攻击性语言和不敏感语言的人类判断;我们还对一个经过培训的、具有地面真相标签的数据集模型进行了分析,以更好地了解这些偏见,并提出了初步的缓解实验。

0

相关内容

基于预训练语言模型的文本生成研究综述

专知会员服务

82+阅读 · 2021年10月15日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【清华腾讯-AAAI2020】双向图卷积神经网络谣言检测，Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks

【清华腾讯-AAAI2020】双向图卷积神经网络谣言检测，Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks

专知会员服务

70+阅读 · 2020年1月20日

【NeurlPS2019论文强烈推荐】vGraph:联合社区检测和节点表示学习的生成模型，vGraph: A Generative Model for Joint Community Detection and Node Representational Learning

【NeurlPS2019论文强烈推荐】vGraph:联合社区检测和节点表示学习的生成模型，vGraph: A Generative Model for Joint Community Detection and Node Representational Learning

专知会员服务

30+阅读 · 2019年12月17日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

【ACL 2019 Tutorials】论据挖掘研究进展（Advances in Argument Mining）

【ACL 2019 Tutorials】论据挖掘研究进展（Advances in Argument Mining）

专知会员服务

16+阅读 · 2019年11月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新六篇机器翻译相关论文—综述、卷积Encoder-Decoder神经网络、字翻译、自编码器、神经短语、RNNs

【论文推荐】最新六篇机器翻译相关论文—综述、卷积Encoder-Decoder神经网络、字翻译、自编码器、神经短语、RNNs

专知

6+阅读 · 2018年2月19日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

人工智能 | 国际会议/SCI期刊约稿信息9条

人工智能 | 国际会议/SCI期刊约稿信息9条

Call4Papers

3+阅读 · 2018年1月12日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

Reply to Comment on "TVOR: Finding Discrete Total Variation Outliers among Histograms"

Reply to Comment on "TVOR: Finding Discrete Total Variation Outliers among Histograms"

Arxiv

0+阅读 · 2021年11月23日

Is Dynamic Rumor Detection on social media Viable? An Unsupervised Perspective

Arxiv

0+阅读 · 2021年11月23日

The ComMA Dataset V0.2: Annotating Aggression and Bias in Multilingual Social Media Discourse

Arxiv

1+阅读 · 2021年11月19日

Toxicity Detection can be Sensitive to the Conversational Context

Arxiv

0+阅读 · 2021年11月19日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

On Disentangled Representations Learned From Correlated Data

Arxiv

8+阅读 · 2021年7月16日

Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks

Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks

Arxiv

4+阅读 · 2020年1月17日

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

Arxiv

6+阅读 · 2018年9月13日

Stylistic Variation in Social Media Part-of-Speech Tagging

Arxiv

4+阅读 · 2018年4月19日

Handling Homographs in Neural Machine Translation

Arxiv

3+阅读 · 2018年3月28日

VIP会员

文章信息

相关主题

相关VIP内容

基于预训练语言模型的文本生成研究综述

专知会员服务

82+阅读 · 2021年10月15日

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

【东大-UCSB】虚假新闻检测的自然语言处理研究综述，A Survey on Natural Language Processing for Fake News Detection

专知会员服务

79+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【清华腾讯-AAAI2020】双向图卷积神经网络谣言检测，Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks

【清华腾讯-AAAI2020】双向图卷积神经网络谣言检测，Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks

专知会员服务

70+阅读 · 2020年1月20日

【NeurlPS2019论文强烈推荐】vGraph:联合社区检测和节点表示学习的生成模型，vGraph: A Generative Model for Joint Community Detection and Node Representational Learning

【NeurlPS2019论文强烈推荐】vGraph:联合社区检测和节点表示学习的生成模型，vGraph: A Generative Model for Joint Community Detection and Node Representational Learning

专知会员服务

30+阅读 · 2019年12月17日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

【ACL 2019 Tutorials】论据挖掘研究进展（Advances in Argument Mining）

【ACL 2019 Tutorials】论据挖掘研究进展（Advances in Argument Mining）

专知会员服务

16+阅读 · 2019年11月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新六篇机器翻译相关论文—综述、卷积Encoder-Decoder神经网络、字翻译、自编码器、神经短语、RNNs

【论文推荐】最新六篇机器翻译相关论文—综述、卷积Encoder-Decoder神经网络、字翻译、自编码器、神经短语、RNNs

专知

6+阅读 · 2018年2月19日

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

【论文推荐】最新5篇信息抽取（IE）相关论文—开放信息抽取、不完整信息、主动学习、越南语、依存分析

专知

12+阅读 · 2018年2月2日

人工智能 | 国际会议/SCI期刊约稿信息9条

人工智能 | 国际会议/SCI期刊约稿信息9条

Call4Papers

3+阅读 · 2018年1月12日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

相关论文

Reply to Comment on "TVOR: Finding Discrete Total Variation Outliers among Histograms"

Reply to Comment on "TVOR: Finding Discrete Total Variation Outliers among Histograms"

Arxiv

0+阅读 · 2021年11月23日

Is Dynamic Rumor Detection on social media Viable? An Unsupervised Perspective

Arxiv

0+阅读 · 2021年11月23日

The ComMA Dataset V0.2: Annotating Aggression and Bias in Multilingual Social Media Discourse

Arxiv

1+阅读 · 2021年11月19日

Toxicity Detection can be Sensitive to the Conversational Context

Arxiv

0+阅读 · 2021年11月19日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

On Disentangled Representations Learned From Correlated Data

Arxiv

8+阅读 · 2021年7月16日

Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks

Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks

Arxiv

4+阅读 · 2020年1月17日

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

Arxiv

6+阅读 · 2018年9月13日

Stylistic Variation in Social Media Part-of-Speech Tagging

Arxiv

4+阅读 · 2018年4月19日

Handling Homographs in Neural Machine Translation

Arxiv

3+阅读 · 2018年3月28日

微信扫码咨询专知VIP会员