争取以可靠的地面真相探测失利语言 (Towards A Reliable Ground-Truth For Biased Language Detection) - 专知论文

会员服务 ·

0

有偏 · Continuity · 数据集 · Performer · Machine Learning ·

2021 年 12 月 14 日

Towards A Reliable Ground-Truth For Biased Language Detection

翻译：争取以可靠的地面真相探测失利语言

Timo Spinde,David Krieger,Manuel Plank,Bela Gipp

Reference texts such as encyclopedias and news articles can manifest biased language when objective reporting is substituted by subjective writing. Existing methods to detect bias mostly rely on annotated data to train machine learning models. However, low annotator agreement and comparability is a substantial drawback in available media bias corpora. To evaluate data collection options, we collect and compare labels obtained from two popular crowdsourcing platforms. Our results demonstrate the existing crowdsourcing approaches' lack of data quality, underlining the need for a trained expert framework to gather a more reliable dataset. By creating such a framework and gathering a first dataset, we are able to improve Krippendorff's $\alpha$ = 0.144 (crowdsourcing labels) to $\alpha$ = 0.419 (expert labels). We conclude that detailed annotator training increases data quality, improving the performance of existing bias detection systems. We will continue to extend our dataset in the future.

翻译：在客观报告被主观书面文件取代时,百科全书和新闻文章等参考文本可能表现出偏见。现有的检测偏向的方法主要依靠附加说明的数据来培训机器学习模式。然而,低注解协议和可比性是现有媒体偏向公司的一个重大缺陷。为了评估数据收集选项,我们收集并比较了从两个受欢迎的众包平台获得的标签。我们的结果表明,现有众包办法缺乏数据质量,强调需要经过培训的专家框架来收集更可靠的数据集。通过建立这样一个框架和收集第一个数据集,我们能够将Krippendorff的$=0.144(采购标签)改进为$\alpha$=0.419(专家标签)。我们的结论是,详细的注解培训提高了数据质量,改进了现有偏差探测系统的性能。我们今后将继续扩大我们的数据集。

0

相关内容

【杜克-Bhuwan Dhingra】语言模型即知识图谱，46页ppt

【杜克-Bhuwan Dhingra】语言模型即知识图谱，46页ppt

专知会员服务

67+阅读 · 2021年11月15日

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

专知会员服务

26+阅读 · 2020年5月6日

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

专知会员服务

33+阅读 · 2020年5月2日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

【KDD2019|讲座推荐】工业中可解释的人工智能：Fake News Research: Theories, Detection Strategies, and Open Problems

专知会员服务

67+阅读 · 2019年12月9日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

开放知识图谱

5+阅读 · 2019年4月16日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Soft-NMS – Improving Object Detection With One Line of Code

Soft-NMS – Improving Object Detection With One Line of Code

统计学习与视觉计算组

6+阅读 · 2018年3月30日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Unified smoke and fire detection in an evolutionary framework with self-supervised progressive data augment

Arxiv

0+阅读 · 2022年2月16日

PILED: An Identify-and-Localize Framework for Few-Shot Event Detection

Arxiv

0+阅读 · 2022年2月15日

Multi-Task Self-Supervised Learning for Disfluency Detection

Arxiv

5+阅读 · 2019年8月15日

Applying Faster R-CNN for Object Detection on Malaria Images

Applying Faster R-CNN for Object Detection on Malaria Images

Arxiv

5+阅读 · 2019年3月11日

DeepFakes: a New Threat to Face Recognition? Assessment and Detection

Arxiv

6+阅读 · 2018年12月20日

Salience Biased Loss for Object Detection in Aerial Images

Arxiv

5+阅读 · 2018年10月18日

Weakly Supervised Object Detection in Artworks

Arxiv

4+阅读 · 2018年10月5日

Neural Models for Key Phrase Detection and Question Generation

Arxiv

4+阅读 · 2018年5月30日

Towards Human-Machine Cooperation: Self-supervised Sample Mining for Object Detection

Arxiv

6+阅读 · 2018年3月27日

Improved Neural Relation Detection for Knowledge Base Question Answering

Arxiv

3+阅读 · 2017年5月27日

VIP会员

文章信息

相关主题

Machine Learning

相关VIP内容

【杜克-Bhuwan Dhingra】语言模型即知识图谱，46页ppt

【杜克-Bhuwan Dhingra】语言模型即知识图谱，46页ppt

专知会员服务

67+阅读 · 2021年11月15日

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

专知会员服务

26+阅读 · 2020年5月6日

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

【2020关键词提取】医学报告的关键词提取和结构化，Keyword extraction and structuralization of medical reports

专知会员服务

33+阅读 · 2020年5月2日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

【KDD2019|讲座推荐】工业中可解释的人工智能：Fake News Research: Theories, Detection Strategies, and Open Problems

专知会员服务

67+阅读 · 2019年12月9日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

美陆军五大转型方向

一种Agent自主性风险评估框架 | 最新文献

实时无人机指令处理：一种面向无人机系统的大语言模型方法

基于动态知识图谱的人工智能代理自主研究周期 | 文献

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

学术会议 | 知识图谱顶会 ISWC 征稿：Poster/Demo

开放知识图谱

5+阅读 · 2019年4月16日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Soft-NMS – Improving Object Detection With One Line of Code

Soft-NMS – Improving Object Detection With One Line of Code

统计学习与视觉计算组

6+阅读 · 2018年3月30日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Unified smoke and fire detection in an evolutionary framework with self-supervised progressive data augment

Arxiv

0+阅读 · 2022年2月16日

PILED: An Identify-and-Localize Framework for Few-Shot Event Detection

Arxiv

0+阅读 · 2022年2月15日

Multi-Task Self-Supervised Learning for Disfluency Detection

Arxiv

5+阅读 · 2019年8月15日

Applying Faster R-CNN for Object Detection on Malaria Images

Applying Faster R-CNN for Object Detection on Malaria Images

Arxiv

5+阅读 · 2019年3月11日

DeepFakes: a New Threat to Face Recognition? Assessment and Detection

Arxiv

6+阅读 · 2018年12月20日

Salience Biased Loss for Object Detection in Aerial Images

Arxiv

5+阅读 · 2018年10月18日

Weakly Supervised Object Detection in Artworks

Arxiv

4+阅读 · 2018年10月5日

Neural Models for Key Phrase Detection and Question Generation

Arxiv

4+阅读 · 2018年5月30日

Towards Human-Machine Cooperation: Self-supervised Sample Mining for Object Detection

Arxiv

6+阅读 · 2018年3月27日

Improved Neural Relation Detection for Knowledge Base Question Answering

Arxiv

3+阅读 · 2017年5月27日

微信扫码咨询专知VIP会员