以公众事件为根据的谣传发现基准进行丑闻 (Probing Spurious Correlations in Popular Event-Based Rumor Detection Benchmarks) - 专知论文

会员服务 ·

0

相关系数 · Extensibility · 标注 · 过估计 · Learning ·

2022 年 9 月 19 日

Probing Spurious Correlations in Popular Event-Based Rumor Detection Benchmarks

翻译：以公众事件为根据的谣传发现基准进行丑闻

Jiaying Wu,Bryan Hooi

from arxiv, Accepted to ECML-PKDD 2022

As social media becomes a hotbed for the spread of misinformation, the crucial task of rumor detection has witnessed promising advances fostered by open-source benchmark datasets. Despite being widely used, we find that these datasets suffer from spurious correlations, which are ignored by existing studies and lead to severe overestimation of existing rumor detection performance. The spurious correlations stem from three causes: (1) event-based data collection and labeling schemes assign the same veracity label to multiple highly similar posts from the same underlying event; (2) merging multiple data sources spuriously relates source identities to veracity labels; and (3) labeling bias. In this paper, we closely investigate three of the most popular rumor detection benchmark datasets (i.e., Twitter15, Twitter16 and PHEME), and propose event-separated rumor detection as a solution to eliminate spurious cues. Under the event-separated setting, we observe that the accuracy of existing state-of-the-art models drops significantly by over 40%, becoming only comparable to a simple neural classifier. To better address this task, we propose Publisher Style Aggregation (PSA), a generalizable approach that aggregates publisher posting records to learn writing style and veracity stance. Extensive experiments demonstrate that our method outperforms existing baselines in terms of effectiveness, efficiency and generalizability.

翻译：随着社交媒体成为传播错误信息的温床,传闻检测这一关键任务目睹了开放源基准数据集推动的可喜进展。尽管这些数据集被广泛使用,但我们发现,这些数据集存在虚假的相关性,而现有研究对此视而不见,并导致对现有谣言检测性表现的严重高估。虚假的相关性源于三个原因:(1) 基于事件的数据收集和标签计划将相同的真实标签赋予来自同一基本事件的多个非常相似的站点;(2) 将多个数据源合并,错误地将来源身份与真实性标签联系起来;(3) 标签偏见。在本文中,我们仔细调查了三种最受欢迎的谣言检测基准数据集(即Twitter15、Twitter16和PHEME),并提议以事件分离的谣言检测作为消除虚假暗示的一种解决办法。在事件间隔的设置下,我们观察到,现有状态模型的准确性标定率大幅下降40%以上,仅与简单的神经分类相近。为了更好地应对这项任务,我们提议,我们提议出版品风格分类(PSBAGGGGG), 张贴一个通用的可计量性标准,并展示我们现有的标准化标准,以展示现有标准化的可实现效率。

0

相关内容

相关系数

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

基于DSM的建筑密集区域InSAR地形去除和相位解缠

国家自然科学基金

1+阅读 · 2015年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Prohibitin1在胆管癌中的作用及分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

OPG诱导破骨细胞凋亡的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

D-A-A和D-A-Ar线型和星型有机供体材料的分子构筑、合成及其光伏性能的研究

国家自然科学基金

0+阅读 · 2012年12月31日

干细胞转录因子Oct4/Nanog激活Stat3信号促进肝癌侵袭转移的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

MCM3-SYF2复合物对cyclin D1-CDKs调节在星形胶质细胞炎症激活中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

Pharicin B稳定维甲酸受体的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Nrf2在肿瘤耐药中的作用及其机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

CD-FSOD: A Benchmark for Cross-domain Few-shot Object Detection

Arxiv

0+阅读 · 2022年10月27日

Watermarking for Out-of-distribution Detection

Arxiv

0+阅读 · 2022年10月27日

Towards Practical Few-Shot Query Sets: Transductive Minimum Description Length Inference

Arxiv

0+阅读 · 2022年10月26日

Arc travel time and path choice model estimation subsumed

Arxiv

0+阅读 · 2022年10月25日

CrisisLTLSum: A Benchmark for Local Crisis Event Timeline Extraction and Summarization

Arxiv

0+阅读 · 2022年10月25日

A Unified Model for Multi-class Anomaly Detection

Arxiv

0+阅读 · 2022年10月25日

Variational Bayesian Inference Clustering Based Joint User Activity and Data Detection for Grant-Free Random Access in mMTC

Arxiv

0+阅读 · 2022年10月25日

Towards Large-Scale Small Object Detection: Survey and Benchmarks

Arxiv

40+阅读 · 2022年7月28日

Few-shot Learning for Multi-label Intent Detection

Arxiv

21+阅读 · 2020年10月11日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

VIP会员

文章信息

相关主题

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

CD-FSOD: A Benchmark for Cross-domain Few-shot Object Detection

Arxiv

0+阅读 · 2022年10月27日

Watermarking for Out-of-distribution Detection

Arxiv

0+阅读 · 2022年10月27日

Towards Practical Few-Shot Query Sets: Transductive Minimum Description Length Inference

Arxiv

0+阅读 · 2022年10月26日

Arc travel time and path choice model estimation subsumed

Arxiv

0+阅读 · 2022年10月25日

CrisisLTLSum: A Benchmark for Local Crisis Event Timeline Extraction and Summarization

Arxiv

0+阅读 · 2022年10月25日

A Unified Model for Multi-class Anomaly Detection

Arxiv

0+阅读 · 2022年10月25日

Variational Bayesian Inference Clustering Based Joint User Activity and Data Detection for Grant-Free Random Access in mMTC

Arxiv

0+阅读 · 2022年10月25日

Towards Large-Scale Small Object Detection: Survey and Benchmarks

Arxiv

40+阅读 · 2022年7月28日

Few-shot Learning for Multi-label Intent Detection

Arxiv

21+阅读 · 2020年10月11日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

相关基金

基于DSM的建筑密集区域InSAR地形去除和相位解缠

国家自然科学基金

1+阅读 · 2015年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Prohibitin1在胆管癌中的作用及分子机制

国家自然科学基金

0+阅读 · 2013年12月31日

OPG诱导破骨细胞凋亡的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

D-A-A和D-A-Ar线型和星型有机供体材料的分子构筑、合成及其光伏性能的研究

国家自然科学基金

0+阅读 · 2012年12月31日

干细胞转录因子Oct4/Nanog激活Stat3信号促进肝癌侵袭转移的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

MCM3-SYF2复合物对cyclin D1-CDKs调节在星形胶质细胞炎症激活中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

Pharicin B稳定维甲酸受体的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Nrf2在肿瘤耐药中的作用及其机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员