跨线:对事实核查模型进行偏向性分析的矛盾数据增加方法 (CrossAug: A Contrastive Data Augmentation Method for Debiasing Fact Verification Models) - 专知论文

会员服务 ·

0

contrastive · MoDELS · Performer · 数据增强 · Pair ·

2021 年 9 月 30 日

CrossAug: A Contrastive Data Augmentation Method for Debiasing Fact Verification Models

翻译：跨线:对事实核查模型进行偏向性分析的矛盾数据增加方法

Minwoo Lee,Seungpil Won,Juae Kim,Hwanhee Lee,Cheoneum Park,Kyomin Jung

from arxiv, 5 pages, accepted as a short paper at CIKM 2021

Fact verification datasets are typically constructed using crowdsourcing techniques due to the lack of text sources with veracity labels. However, the crowdsourcing process often produces undesired biases in data that cause models to learn spurious patterns. In this paper, we propose CrossAug, a contrastive data augmentation method for debiasing fact verification models. Specifically, we employ a two-stage augmentation pipeline to generate new claims and evidences from existing samples. The generated samples are then paired cross-wise with the original pair, forming contrastive samples that facilitate the model to rely less on spurious patterns and learn more robust representations. Experimental results show that our method outperforms the previous state-of-the-art debiasing technique by 3.6% on the debiased extension of the FEVER dataset, with a total performance boost of 10.13% from the baseline. Furthermore, we evaluate our approach in data-scarce settings, where models can be more susceptible to biases due to the lack of training data. Experimental results demonstrate that our approach is also effective at debiasing in these low-resource conditions, exceeding the baseline performance on the Symmetric dataset with just 1% of the original data.

翻译：事实核查数据集通常使用众包技术构建,因为缺少具有真实标签的文本源。然而,众包过程往往在数据中产生不理想的偏差,导致模型学习虚假模式。在本文件中,我们提议CrossAug,这是一个反偏向事实核查模型的对比性数据增强方法。具体地说,我们使用一个两阶段增强性能管道,从现有样品中产生新的主张和证据。然后,产生的样品与原始样品配对,形成对比样本,为模型较少依赖虚假模式和学习更强的演示提供便利。实验结果显示,我们的方法在Fever数据集的脱轨扩展中,比先前的艺术减偏向状态技术高出3.6%,比基线值增加10.13%。此外,我们评估了我们在数据偏差环境中的方法,因为缺乏培训数据,模型更容易受到偏差。实验结果表明,我们的方法在这些低资源条件下也有效减少了偏差,超过Symrial数据原位的原位性能,比原位数据高出3.6%。

0

相关内容

contrastive

近期必读的5篇顶会WWW 2021【对比学习（CL）】相关论文和代码

专知会员服务

51+阅读 · 2021年6月17日

【AAAI2021】对比聚类，Contrastive Clustering

【AAAI2021】对比聚类，Contrastive Clustering

专知会员服务

78+阅读 · 2021年1月30日

【MIT】反偏差对比学习，Debiased Contrastive Learning

【MIT】反偏差对比学习，Debiased Contrastive Learning

专知会员服务

91+阅读 · 2020年7月4日

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

专知会员服务

17+阅读 · 2020年5月19日

【ACL2020-Google】BLEURT:一种基于迁移学习的自然语言生成度量

【ACL2020-Google】BLEURT:一种基于迁移学习的自然语言生成度量

专知会员服务

20+阅读 · 2020年5月12日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【SIGMOD2020】稀疏数据半监督学习的分解图表示，Factorized Graph Representations for Semi-Supervised Learning from Sparse Data

【SIGMOD2020】稀疏数据半监督学习的分解图表示，Factorized Graph Representations for Semi-Supervised Learning from Sparse Data

专知会员服务

15+阅读 · 2020年3月7日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

专知

26+阅读 · 2018年5月22日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Data augmentation by morphological mixup for solving Raven's Progressive Matrices

Arxiv

0+阅读 · 2021年11月19日

Multi-view Contrastive Graph Clustering

Arxiv

13+阅读 · 2021年10月22日

Dense Contrastive Visual-Linguistic Pretraining

Arxiv

6+阅读 · 2021年9月24日

Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation

Arxiv

4+阅读 · 2021年7月26日

Graph Contrastive Learning with Adaptive Augmentation

Arxiv

5+阅读 · 2021年2月15日

Data Augmentation for Graph Neural Networks

Arxiv

38+阅读 · 2020年12月2日

Contrastive Clustering

Arxiv

31+阅读 · 2020年9月21日

Data Augmentation using Pre-trained Transformer Models

Arxiv

17+阅读 · 2020年3月4日

Unsupervised Data Augmentation for Consistency Training

Arxiv

5+阅读 · 2019年7月10日

Multi-task Learning for Universal Sentence Embeddings: A Thorough Evaluation using Transfer and Auxiliary Tasks

Multi-task Learning for Universal Sentence Embeddings: A Thorough Evaluation using Transfer and Auxiliary Tasks

Arxiv

3+阅读 · 2018年8月16日

VIP会员

文章信息

相关主题

相关VIP内容

近期必读的5篇顶会WWW 2021【对比学习（CL）】相关论文和代码

专知会员服务

51+阅读 · 2021年6月17日

【AAAI2021】对比聚类，Contrastive Clustering

【AAAI2021】对比聚类，Contrastive Clustering

专知会员服务

78+阅读 · 2021年1月30日

【MIT】反偏差对比学习，Debiased Contrastive Learning

【MIT】反偏差对比学习，Debiased Contrastive Learning

专知会员服务

91+阅读 · 2020年7月4日

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

【ACL2020】Span-ConveRT：预训练对话表示小样本跨度提取，Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

专知会员服务

17+阅读 · 2020年5月19日

【ACL2020-Google】BLEURT:一种基于迁移学习的自然语言生成度量

【ACL2020-Google】BLEURT:一种基于迁移学习的自然语言生成度量

专知会员服务

20+阅读 · 2020年5月12日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【SIGMOD2020】稀疏数据半监督学习的分解图表示，Factorized Graph Representations for Semi-Supervised Learning from Sparse Data

【SIGMOD2020】稀疏数据半监督学习的分解图表示，Factorized Graph Representations for Semi-Supervised Learning from Sparse Data

专知会员服务

15+阅读 · 2020年3月7日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《城市滨海地区：理解复杂多变环境下的指挥控制框架》50页报告

《理解城市战及其在俄乌战争中的表现》报告

美空军“顶点2025”实验：推进AI在C2、动态目标锁定与联盟集成中的应用

《建设式兵棋模拟作为战术集群配置优化的关键组成部分》

相关资讯

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

【论文推荐】最新五篇命名实体识别相关论文—深度主动学习、Lattice LSTM、混合马尔可夫CRF

专知

26+阅读 · 2018年5月22日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

Data augmentation by morphological mixup for solving Raven's Progressive Matrices

Arxiv

0+阅读 · 2021年11月19日

Multi-view Contrastive Graph Clustering

Arxiv

13+阅读 · 2021年10月22日

Dense Contrastive Visual-Linguistic Pretraining

Arxiv

6+阅读 · 2021年9月24日

Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation

Arxiv

4+阅读 · 2021年7月26日

Graph Contrastive Learning with Adaptive Augmentation

Arxiv

5+阅读 · 2021年2月15日

Data Augmentation for Graph Neural Networks

Arxiv

38+阅读 · 2020年12月2日

Contrastive Clustering

Arxiv

31+阅读 · 2020年9月21日

Data Augmentation using Pre-trained Transformer Models

Arxiv

17+阅读 · 2020年3月4日

Unsupervised Data Augmentation for Consistency Training

Arxiv

5+阅读 · 2019年7月10日

Multi-task Learning for Universal Sentence Embeddings: A Thorough Evaluation using Transfer and Auxiliary Tasks

Multi-task Learning for Universal Sentence Embeddings: A Thorough Evaluation using Transfer and Auxiliary Tasks

Arxiv

3+阅读 · 2018年8月16日

微信扫码咨询专知VIP会员