Vertical federated learning (VFL) is a privacy-preserving machine learning paradigm that can learn models from features distributed on different platforms in a privacy-preserving way. Since in real-world applications the data may contain bias on fairness-sensitive features (e.g., gender), VFL models may inherit bias from training data and become unfair for some user groups. However, existing fair machine learning methods usually rely on the centralized storage of fairness-sensitive features to achieve model fairness, which are usually inapplicable in federated scenarios. In this paper, we propose a fair vertical federated learning framework (FairVFL), which can improve the fairness of VFL models. The core idea of FairVFL is to learn unified and fair representations of samples based on the decentralized feature fields in a privacy-preserving way. Specifically, each platform with fairness-insensitive features first learns local data representations from local features. Then, these local representations are uploaded to a server and aggregated into a unified representation for the target task. In order to learn a fair unified representation, we send it to each platform storing fairness-sensitive features and apply adversarial learning to remove bias from the unified representation inherited from the biased data. Moreover, for protecting user privacy, we further propose a contrastive adversarial learning method to remove private information from the unified representation in server before sending it to the platforms keeping fairness-sensitive features. Experiments on three real-world datasets validate that our method can effectively improve model fairness with user privacy well-protected.
翻译:纵向垂直学习(VFL)是一种保护隐私的机器学习模式,它可以以保护隐私的方式从不同平台上分布的特征中学习模型; 由于在现实应用中,数据可能含有对公平敏感特征(例如性别)的偏差, VFL模式可能会从培训数据中继承偏差,而对某些用户群体来说则变得不公平; 然而,现有的公平机器学习方法通常依赖于集中储存对公平敏感的特征,以实现模型公平,而这些特征通常不适用于联合情景。在本文中,我们提议一个公平的垂直联合学习框架(FairVFLL),该框架可以提高VFL模式的公平性。在现实应用中,FairVFLL的核心思想是学习基于分散特征领域(例如性别)的样本的统一和公平表述方式,具体地说,每个具有对公平不敏感特征的平台首先从当地特征学习当地数据。然后,这些地方代表被上传到服务器,并归为目标任务的统一代表制。为了学习公平统一代表制,我们将其发送到每个平台储存对公平敏感的特征,并应用对抗性学习从分散用户对等数据的方法。