Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic Contrast Sets - 专知论文

会员服务 ·

0

有偏 · contrastive · 数据集 · MoDELS · 相关系数 ·

2023 年 5 月 24 日

Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic Contrast Sets

翻译：暂无翻译

Brandon Smith,Miguel Farinha,Siobhan Mackenzie Hall,Hannah Rose Kirk,Aleksandar Shtedritski,Max Bain

from arxiv, Github: https://github.com/oxai/debias-gensynth

Vision-language models are growing in popularity and public visibility to generate, edit, and caption images at scale; but their outputs can perpetuate and amplify societal biases learned during pre-training on uncurated image-text pairs from the internet. Although debiasing methods have been proposed, we argue that these measurements of model bias lack validity due to dataset bias. We demonstrate there are spurious correlations in COCO Captions, the most commonly used dataset for evaluating bias, between background context and the gender of people in-situ. This is problematic because commonly-used bias metrics (such as Bias@K) rely on per-gender base rates. To address this issue, we propose a novel dataset debiasing pipeline to augment the COCO dataset with synthetic, gender-balanced contrast sets, where only the gender of the subject is edited and the background is fixed. However, existing image editing methods have limitations and sometimes produce low-quality images; so, we introduce a method to automatically filter the generated images based on their similarity to real images. Using our balanced synthetic contrast sets, we benchmark bias in multiple CLIP-based models, demonstrating how metrics are skewed by imbalance in the original COCO images. Our results indicate that the proposed approach improves the validity of the evaluation, ultimately contributing to more realistic understanding of bias in vision-language models.

翻译：暂无翻译

0

相关内容

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

陶瓷电极催化活性结构的原位构筑与调制及电解CO2机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

氟盐体系熔盐电解法制备LaSm合金的机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

纳米Ni/(La,Sr)TiO3+δ-YSZ复合阴极体系的制备与电解CO2研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

黏性土采动变形-渗透耦合效应研究

国家自然科学基金

0+阅读 · 2012年12月31日

A Deep Learning Method for Comparing Bayesian Hierarchical Models

Arxiv

0+阅读 · 2023年7月12日

B-HAR: an open-source baseline framework for in depth study of human activity recognition datasets and workflows

Arxiv

0+阅读 · 2023年7月12日

Mitigating Bias: Enhancing Image Classification by Improving Model Explanations

Arxiv

0+阅读 · 2023年7月10日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

CNN+CNN: Convolutional Decoders for Image Captioning

Arxiv

21+阅读 · 2018年5月23日

VIP会员

文章信息

相关主题

相关VIP内容

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【MIT博士论文】弱监督学习：理论、方法与应用

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

锚定情报：合成欺骗时代的地面真相

NeurIPS 2025 | NMKE：基于神经元归因与动态稀疏掩码的终身知识编辑

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

相关论文

A Deep Learning Method for Comparing Bayesian Hierarchical Models

Arxiv

0+阅读 · 2023年7月12日

B-HAR: an open-source baseline framework for in depth study of human activity recognition datasets and workflows

Arxiv

0+阅读 · 2023年7月12日

Mitigating Bias: Enhancing Image Classification by Improving Model Explanations

Arxiv

0+阅读 · 2023年7月10日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

CNN+CNN: Convolutional Decoders for Image Captioning

Arxiv

21+阅读 · 2018年5月23日

相关基金

陶瓷电极催化活性结构的原位构筑与调制及电解CO2机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

氟盐体系熔盐电解法制备LaSm合金的机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

纳米Ni/(La,Sr)TiO3+δ-YSZ复合阴极体系的制备与电解CO2研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

黏性土采动变形-渗透耦合效应研究

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员