Contrastive self-supervised learning (SSL) learns an embedding space that maps similar data pairs closer and dissimilar data pairs farther apart. Despite its success, one issue has been overlooked: the fairness aspect of representations learned using contrastive SSL. Without mitigation, contrastive SSL techniques can incorporate sensitive information such as gender or race and cause potentially unfair predictions on downstream tasks. In this paper, we propose a Conditional Contrastive Learning (CCL) approach to improve the fairness of contrastive SSL methods. Our approach samples positive and negative pairs from distributions conditioning on the sensitive attribute, or empirically speaking, sampling positive and negative pairs from the same gender or the same race. We show that our approach provably maximizes the conditional mutual information between the learned representations of the positive pairs, and reduces the effect of the sensitive attribute by taking it as the conditional variable. On seven fairness and vision datasets, we empirically demonstrate that the proposed approach achieves state-of-the-art downstream performances compared to unsupervised baselines and significantly improves the fairness of contrastive SSL models on multiple fairness metrics.
翻译:自我监督的自我监督学习(SSL) 学习一个嵌入空间, 用来绘制相似的数据配对更近、更相异的数据配对。 尽管它取得了成功, 但有一个问题却被忽略了: 使用对比的 SSL 学习的表达方式的公平性。 不进行减缓, 对比的SSL 技术可以包含敏感信息, 如性别或种族, 并可能对下游任务造成潜在不公平的预测。 在本文中, 我们提议了一种有条件的相互竞争学习( CCL) 方法, 以提高对比性 SSL 方法的公平性。 我们的方法从基于敏感属性的分布中, 或者从实验性的角度, 抽样从同一性别或同一种族的正对对和负对进行对比。 我们显示, 我们的方法可以尽量扩大对正对正对的表达之间的有条件的相互信息, 通过将敏感属性作为有条件的变量来减少敏感属性的影响。 在7个公平和视觉数据集中, 我们从经验上证明, 拟议的方法取得了与不受监督的基线相比, 并大大改进了对比性 SL 模型在多重公平度衡量的对比性模型的可靠性。