To mitigate the effects of undesired biases in models, several approaches propose to pre-process the input dataset to reduce the risks of discrimination by preventing the inference of sensitive attributes. Unfortunately, most of these pre-processing methods lead to the generation a new distribution that is very different from the original one, thus often leading to unrealistic data. As a side effect, this new data distribution implies that existing models need to be re-trained to be able to make accurate predictions. To address this issue, we propose a novel pre-processing method, that we coin as fair mapping, based on the transformation of the distribution of protected groups onto a chosen target one, with additional privacy constraints whose objective is to prevent the inference of sensitive attributes. More precisely, we leverage on the recent works of the Wasserstein GAN and AttGAN frameworks to achieve the optimal transport of data points coupled with a discriminator enforcing the protection against attribute inference. Our proposed approach, preserves the interpretability of data and can be used without defining exactly the sensitive groups. In addition, our approach can be specialized to model existing state-of-the-art approaches, thus proposing a unifying view on these methods. Finally, several experiments on real and synthetic datasets demonstrate that our approach is able to hide the sensitive attributes, while limiting the distortion of the data and improving the fairness on subsequent data analysis tasks.
翻译:为了减轻模型中不理想的偏差的影响,若干方法提议预先处理输入数据集,通过防止敏感属性的推断,减少歧视风险。不幸的是,这些预处理方法大多导致产生与最初的非常不同的新分布,从而往往导致不现实的数据。作为一种副作用,这种新的数据分配意味着需要对现有模型进行再培训,以便能够作出准确的预测。为了解决这个问题,我们建议一种新的预处理方法,在将受保护群体分布转变为选定目标目标的基础上,将我们作为公平绘图,我们作为公平绘图,我们的方法可以将受保护群体分布转变为选定的目标之一,并附加隐私限制,其目的在于防止敏感属性的推断。更确切地说,我们利用瓦塞尔斯坦GAN和AttGAN框架最近的工作,实现数据点的最佳运输,同时对保护属性加以区别。我们提议的方法是维护数据的可解释性,并且可以在不确切界定敏感群体的情况下加以使用。此外,我们的方法可以专门用来模拟现有状态方法,目的是防止敏感属性属性的推断。我们的目标是防止敏感属性的推断。更精确地利用瓦塞斯特斯坦GAN和AttGAN框架的最近的工作,从而实现数据的最佳运输,从而缩小了这些方法的精确性。