FairGen:公平合成数据生成 (FairGen: Fair Synthetic Data Generation) - 专知论文

会员服务 ·

0

Facebook AI Research · GANs · 有偏 · 训练数据 · GaN ·

2022 年 12 月 1 日

FairGen: Fair Synthetic Data Generation

翻译：FairGen:公平合成数据生成

Bhushan Chaudhari,Himanshu Chaudhary,Aakash Agarwal,Kamna Meena,Tanmoy Bhowmik

With the rising adoption of Machine Learning across the domains like banking, pharmaceutical, ed-tech, etc, it has become utmost important to adopt responsible AI methods to ensure models are not unfairly discriminating against any group. Given the lack of clean training data, generative adversarial techniques are preferred to generate synthetic data with several state-of-the-art architectures readily available across various domains from unstructured data such as text, images to structured datasets modelling fraud detection and many more. These techniques overcome several challenges such as class imbalance, limited training data, restricted access to data due to privacy issues. Existing work focusing on generating fair data either works for a certain GAN architecture or is very difficult to tune across the GANs. In this paper, we propose a pipeline to generate fairer synthetic data independent of the GAN architecture. The proposed paper utilizes a pre-processing algorithm to identify and remove bias inducing samples. In particular, we claim that while generating synthetic data most GANs amplify bias present in the training data but by removing these bias inducing samples, GANs essentially focuses more on real informative samples. Our experimental evaluation on two open-source datasets demonstrates how the proposed pipeline is generating fair data along with improved performance in some cases.

翻译：随着银行、制药、电子技术等各领域越来越多地采用机器学习等机器学习,采取负责任的AI方法确保模型不会对任何群体造成不公平的歧视,就变得极为重要了。鉴于缺乏清洁的培训数据,偏好采用基因对抗技术来生成合成数据,利用若干最先进的结构架构,从文本、图像、结构化数据集等非结构化数据,从文本、图像、模拟欺诈检测等结构化数据集和许多其他方面,在各个领域很容易获得合成数据。这些技术克服了若干挑战,如阶级不平衡、培训数据有限、由于隐私问题而限制数据获取机会等。现有的侧重于为某个GAN结构制作公平数据的工作,或者在某些GAN结构中很难调和。在本文件中,我们提议建立一个管道,以产生较公平的合成数据,独立于GAN结构。拟议的文件使用预处理算法来识别和消除偏向诱导的样本。特别是,我们声称,在产生大多数合成GAN数据的同时,通过消除这些偏向导出样本的偏向性,GAN基本上更侧重于真实的信息样本。我们关于两个开源数据集的实验性评估表明,在编审中如何产生公正的数据。

0

相关内容

Facebook AI Research

Facebook AI Research

Facebook AI Research

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

AlGaN极化场调控对内量子效率的影响

国家自然科学基金

1+阅读 · 2016年12月31日

SiC纳米线/CVD石墨烯/热解炭复合材料制备、界面结构与力学性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

图的边覆盖染色

国家自然科学基金

1+阅读 · 2014年12月31日

千核级通用微处理器共享存储体系结构研究

国家自然科学基金

0+阅读 · 2014年12月31日

复合石墨烯负载纳米双金属催化剂的结构调控及其ORR催化性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

AlGaN基光偏振调制和深紫外发光机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于液化地基大变形桩基-高层建筑结构体系动力响应的研究

国家自然科学基金

0+阅读 · 2012年12月31日

GaN半导体纳米材料的多场耦合失效机理

国家自然科学基金

0+阅读 · 2012年12月31日

中高温燃料电池用新型含硫聚苯并咪唑复合质子交换膜的制备及性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

Data Mesh: Motivational Factors, Challenges, and Best Practices

Arxiv

0+阅读 · 2023年2月3日

GTV: Generating Tabular Data via Vertical Federated Learning

Arxiv

0+阅读 · 2023年2月3日

An Operational Perspective to Fairness Interventions: Where and How to Intervene

Arxiv

0+阅读 · 2023年2月3日

The Value of Out-of-Distribution Data

Arxiv

0+阅读 · 2023年2月2日

Learning Optimal Fair Classification Trees: Trade-offs Between Interpretability, Fairness, and Accuracy

Arxiv

0+阅读 · 2023年2月1日

How Out-of-Distribution Data Hurts Semi-Supervised Learning

How Out-of-Distribution Data Hurts Semi-Supervised Learning

Arxiv

0+阅读 · 2023年2月1日

A Survey of Learning on Small Data

Arxiv

19+阅读 · 2022年7月29日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods

Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods

Arxiv

88+阅读 · 2019年3月27日

VIP会员

文章信息

相关主题

Facebook AI Research

相关VIP内容

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《巡飞弹药（爆炸性无人机）威胁态势分析》最新24页报告

《军用后勤无人机：破解战场运输挑战的创新方案》

人工智能战争：以色列、伊朗与新型AI战争形态

《俄乌战争：现代战争未来的启示与经验》

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

Data Mesh: Motivational Factors, Challenges, and Best Practices

Arxiv

0+阅读 · 2023年2月3日

GTV: Generating Tabular Data via Vertical Federated Learning

Arxiv

0+阅读 · 2023年2月3日

An Operational Perspective to Fairness Interventions: Where and How to Intervene

Arxiv

0+阅读 · 2023年2月3日

The Value of Out-of-Distribution Data

Arxiv

0+阅读 · 2023年2月2日

Learning Optimal Fair Classification Trees: Trade-offs Between Interpretability, Fairness, and Accuracy

Arxiv

0+阅读 · 2023年2月1日

How Out-of-Distribution Data Hurts Semi-Supervised Learning

How Out-of-Distribution Data Hurts Semi-Supervised Learning

Arxiv

0+阅读 · 2023年2月1日

A Survey of Learning on Small Data

Arxiv

19+阅读 · 2022年7月29日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods

Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods

Arxiv

88+阅读 · 2019年3月27日

相关基金

AlGaN极化场调控对内量子效率的影响

国家自然科学基金

1+阅读 · 2016年12月31日

SiC纳米线/CVD石墨烯/热解炭复合材料制备、界面结构与力学性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

图的边覆盖染色

国家自然科学基金

1+阅读 · 2014年12月31日

千核级通用微处理器共享存储体系结构研究

国家自然科学基金

0+阅读 · 2014年12月31日

复合石墨烯负载纳米双金属催化剂的结构调控及其ORR催化性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

AlGaN基光偏振调制和深紫外发光机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于液化地基大变形桩基-高层建筑结构体系动力响应的研究

国家自然科学基金

0+阅读 · 2012年12月31日

GaN半导体纳米材料的多场耦合失效机理

国家自然科学基金

0+阅读 · 2012年12月31日

中高温燃料电池用新型含硫聚苯并咪唑复合质子交换膜的制备及性能研究

国家自然科学基金

0+阅读 · 2011年12月31日

编码密码学中若干组合对象研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员