In recent years, most fairness strategies in machine learning models focus on mitigating unwanted biases by assuming that the sensitive information is observed. However this is not always possible in practice. Due to privacy purposes and var-ious regulations such as RGPD in EU, many personal sensitive attributes are frequently not collected. We notice a lack of approaches for mitigating bias in such difficult settings, in particular for achieving classical fairness objectives such as Demographic Parity and Equalized Odds. By leveraging recent developments for approximate inference, we propose an approach to fill this gap. Based on a causal graph, we rely on a new variational auto-encoding based framework named SRCVAE to infer a sensitive information proxy, that serve for bias mitigation in an adversarial fairness approach. We empirically demonstrate significant improvements over existing works in the field. We observe that the generated proxy's latent space recovers sensitive information and that our approach achieves a higher accuracy while obtaining the same level of fairness on two real datasets, as measured using com-mon fairness definitions.

1
下载
关闭预览

相关内容

Both fair machine learning and adversarial learning have been extensively studied. However, attacking fair machine learning models has received less attention. In this paper, we present a framework that seeks to effectively generate poisoning samples to attack both model accuracy and algorithmic fairness. Our attacking framework can target fair machine learning models trained with a variety of group based fairness notions such as demographic parity and equalized odds. We develop three online attacks, adversarial sampling , adversarial labeling, and adversarial feature modification. All three attacks effectively and efficiently produce poisoning samples via sampling, labeling, or modifying a fraction of training data in order to reduce the test accuracy. Our framework enables attackers to flexibly adjust the attack's focus on prediction accuracy or fairness and accurately quantify the impact of each candidate point to both accuracy loss and fairness violation, thus producing effective poisoning samples. Experiments on two real datasets demonstrate the effectiveness and efficiency of our framework.

0
0
下载
预览

Graph neural networks (GNNs) have shown great power in modeling graph structured data. However, similar to other machine learning models, GNNs may make predictions biased on protected sensitive attributes, e.g., skin color and gender. Because machine learning algorithms including GNNs are trained to reflect the distribution of the training data which often contains historical bias towards sensitive attributes. In addition, the discrimination in GNNs can be magnified by graph structures and the message-passing mechanism. As a result, the applications of GNNs in sensitive domains such as crime rate prediction would be largely limited. Though extensive studies of fair classification have been conducted on i.i.d data, methods to address the problem of discrimination on non-i.i.d data are rather limited. Furthermore, the practical scenario of sparse annotations in sensitive attributes is rarely considered in existing works. Therefore, we study the novel and important problem of learning fair GNNs with limited sensitive attribute information. FairGNN is proposed to eliminate the bias of GNNs whilst maintaining high node classification accuracy by leveraging graph structures and limited sensitive information. Our theoretical analysis shows that FairGNN can ensure the fairness of GNNs under mild conditions given limited nodes with known sensitive attributes. Extensive experiments on real-world datasets also demonstrate the effectiveness of FairGNN in debiasing and keeping high accuracy.

0
0
下载
预览

With the growing interest in the machine learning community to solve real-world problems, it has become crucial to uncover the hidden reasoning behind their decisions by focusing on the fairness and auditing the predictions made by these black-box models. In this paper, we propose a novel method to address two key issues: (a) Can we simultaneously learn fair disentangled representations while ensuring the utility of the learned representation for downstream tasks, and (b)Can we provide theoretical insights into when the proposed approach will be both fair and accurate. To address the former, we propose the method FRIED, Fair Representation learning using Interpolation Enabled Disentanglement. In our architecture, by imposing a critic-based adversarial framework, we enforce the interpolated points in the latent space to be more realistic. This helps in capturing the data manifold effectively and enhances the utility of the learned representation for downstream prediction tasks. We address the latter question by developing a theory on fairness-accuracy trade-offs using classifier-based conditional mutual information estimation. We demonstrate the effectiveness of FRIED on datasets of different modalities - tabular, text, and image datasets. We observe that the representations learned by FRIED are overall fairer in comparison to existing baselines and also accurate for downstream prediction tasks. Additionally, we evaluate FRIED on a real-world healthcare claims dataset where we conduct an expert aided model auditing study providing useful insights into opioid ad-diction patterns.

0
0
下载
预览

Feature attribution is often loosely presented as the process of selecting a subset of relevant features as a rationale of a prediction. This lack of clarity stems from the fact that we usually do not have access to any notion of ground-truth attribution and from a more general debate on what good interpretations are. In this paper we propose to formalise feature selection/attribution based on the concept of relaxed functional dependence. In particular, we extend our notions to the instance-wise setting and derive necessary properties for candidate selection solutions, while leaving room for task-dependence. By computing ground-truth attributions on synthetic datasets, we evaluate many state-of-the-art attribution methods and show that, even when optimised, some fail to verify the proposed properties and provide wrong solutions.

0
3
下载
预览

Learning disentanglement aims at finding a low dimensional representation which consists of multiple explanatory and generative factors of the observational data. The framework of variational autoencoder (VAE) is commonly used to disentangle independent factors from observations. However, in real scenarios, factors with semantics are not necessarily independent. Instead, there might be an underlying causal structure which renders these factors dependent. We thus propose a new VAE based framework named CausalVAE, which includes a Causal Layer to transform independent exogenous factors into causal endogenous ones that correspond to causally related concepts in data. We further analyze the model identifiabitily, showing that the proposed model learned from observations recovers the true one up to a certain degree. Experiments are conducted on various datasets, including synthetic and real word benchmark CelebA. Results show that the causal representations learned by CausalVAE are semantically interpretable, and their causal relationship as a Directed Acyclic Graph (DAG) is identified with good accuracy. Furthermore, we demonstrate that the proposed CausalVAE model is able to generate counterfactual data through "do-operation" to the causal factors.

0
11
下载
预览

Developing classification algorithms that are fair with respect to sensitive attributes of the data has become an important problem due to the growing deployment of classification algorithms in various social contexts. Several recent works have focused on fairness with respect to a specific metric, modeled the corresponding fair classification problem as a constrained optimization problem, and developed tailored algorithms to solve them. Despite this, there still remain important metrics for which we do not have fair classifiers and many of the aforementioned algorithms do not come with theoretical guarantees; perhaps because the resulting optimization problem is non-convex. The main contribution of this paper is a new meta-algorithm for classification that takes as input a large class of fairness constraints, with respect to multiple non-disjoint sensitive attributes, and which comes with provable guarantees. This is achieved by first developing a meta-algorithm for a large family of classification problems with convex constraints, and then showing that classification problems with general types of fairness constraints can be reduced to those in this family. We present empirical results that show that our algorithm can achieve near-perfect fairness with respect to various fairness metrics, and that the loss in accuracy due to the imposed fairness constraints is often small. Overall, this work unifies several prior works on fair classification, presents a practical algorithm with theoretical guarantees, and can handle fairness metrics that were previously not possible.

0
3
下载
预览

Attributes, such as metadata and profile, carry useful information which in principle can help improve accuracy in recommender systems. However, existing approaches have difficulty in fully leveraging attribute information due to practical challenges such as heterogeneity and sparseness. These approaches also fail to combine recurrent neural networks which have recently shown effectiveness in item recommendations in applications such as video and music browsing. To overcome the challenges and to harvest the advantages of sequence models, we present a novel approach, Heterogeneous Attribute Recurrent Neural Networks (HA-RNN), which incorporates heterogeneous attributes and captures sequential dependencies in \textit{both} items and attributes. HA-RNN extends recurrent neural networks with 1) a hierarchical attribute combination input layer and 2) an output attribute embedding layer. We conduct extensive experiments on two large-scale datasets. The new approach show significant improvements over the state-of-the-art models. Our ablation experiments demonstrate the effectiveness of the two components to address heterogeneous attribute challenges including variable lengths and attribute sparseness. We further investigate why sequence modeling works well by conducting exploratory studies and show sequence models are more effective when data scale increases.

0
3
下载
预览

Metric learning learns a metric function from training data to calculate the similarity or distance between samples. From the perspective of feature learning, metric learning essentially learns a new feature space by feature transformation (e.g., Mahalanobis distance metric). However, traditional metric learning algorithms are shallow, which just learn one metric space (feature transformation). Can we further learn a better metric space from the learnt metric space? In other words, can we learn metric progressively and nonlinearly like deep learning by just using the existing metric learning algorithms? To this end, we present a hierarchical metric learning scheme and implement an online deep metric learning framework, namely ODML. Specifically, we take one online metric learning algorithm as a metric layer, followed by a nonlinear layer (i.e., ReLU), and then stack these layers modelled after the deep learning. The proposed ODML enjoys some nice properties, indeed can learn metric progressively and performs superiorly on some datasets. Various experiments with different settings have been conducted to verify these properties of the proposed ODML.

0
8
下载
预览

Deep metric learning has been demonstrated to be highly effective in learning semantic representation and encoding information that can be used to measure data similarity, by relying on the embedding learned from metric learning. At the same time, variational autoencoder (VAE) has widely been used to approximate inference and proved to have a good performance for directed probabilistic models. However, for traditional VAE, the data label or feature information are intractable. Similarly, traditional representation learning approaches fail to represent many salient aspects of the data. In this project, we propose a novel integrated framework to learn latent embedding in VAE by incorporating deep metric learning. The features are learned by optimizing a triplet loss on the mean vectors of VAE in conjunction with standard evidence lower bound (ELBO) of VAE. This approach, which we call Triplet based Variational Autoencoder (TVAE), allows us to capture more fine-grained information in the latent embedding. Our model is tested on MNIST data set and achieves a high triplet accuracy of 95.60% while the traditional VAE (Kingma & Welling, 2013) achieves triplet accuracy of 75.08%.

0
3
下载
预览

Amortized inference has led to efficient approximate inference for large datasets. The quality of posterior inference is largely determined by two factors: a) the ability of the variational distribution to model the true posterior and b) the capacity of the recognition network to generalize inference over all datapoints. We analyze approximate inference in variational autoencoders in terms of these factors. We find that suboptimal inference is often due to amortizing inference rather than the limited complexity of the approximating distribution. We show that this is due partly to the generator learning to accommodate the choice of approximation. Furthermore, we show that the parameters used to increase the expressiveness of the approximation play a role in generalizing inference rather than simply improving the complexity of the approximation.

0
3
下载
预览
小贴士
相关论文
Minh-Hao Van,Wei Du,Xintao Wu,Aidong Lu
0+阅读 · 10月17日
Akshita Jha,Bhanukiran Vinzamuri,Chandan K. Reddy
0+阅读 · 10月13日
Darius Afchar,Romain Hennequin,Vincent Guigue
3+阅读 · 4月26日
Mengyue Yang,Furui Liu,Zhitang Chen,Xinwei Shen,Jianye Hao,Jun Wang
11+阅读 · 3月23日
Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees
L. Elisa Celis,Lingxiao Huang,Vijay Keswani,Nisheeth K. Vishnoi
3+阅读 · 2018年8月2日
Kuan Liu,Xing Shi,Prem Natarajan
3+阅读 · 2018年5月28日
Wenbin Li,Jing Huo,Yinghuan Shi,Yang Gao,Lei Wang,Jiebo Luo
8+阅读 · 2018年5月15日
Haque Ishfaq,Assaf Hoogi,Daniel Rubin
3+阅读 · 2018年4月3日
Chris Cremer,Xuechen Li,David Duvenaud
3+阅读 · 2018年1月10日
相关资讯
Hierarchically Structured Meta-learning
CreateAMind
11+阅读 · 2019年5月22日
Unsupervised Learning via Meta-Learning
CreateAMind
29+阅读 · 2019年1月3日
Disentangled的假设的探讨
CreateAMind
7+阅读 · 2018年12月10日
vae 相关论文 表示学习 1
CreateAMind
10+阅读 · 2018年9月6日
Hierarchical Disentangled Representations
CreateAMind
3+阅读 · 2018年4月15日
【论文】变分推断(Variational inference)的总结
机器学习研究会
24+阅读 · 2017年11月16日
gan生成图像at 1024² 的 代码 论文
CreateAMind
4+阅读 · 2017年10月31日
Adversarial Variational Bayes: Unifying VAE and GAN 代码
CreateAMind
7+阅读 · 2017年10月4日
Auto-Encoding GAN
CreateAMind
5+阅读 · 2017年8月4日
Top