DeepLearningsystemsneedlargedatafortraining.Datasets for training face verification systems are difficult to obtain and prone to privacy issues. Synthetic data generated by generative models such as GANs can be a good alternative. However, we show that data generated from GANs are prone to bias and fairness issues. Specifically GANs trained on FFHQ dataset show bias towards generating white faces in the age group of 20-29. We also demonstrate that synthetic faces cause disparate impact, specifically for race attribute, when used for fine tuning face verification systems. This is measured using $DoB_{fv}$ metric, which is defined as standard deviation of GAR@FAR for face verification.
翻译:深学习系统需要用于培训的大型数据。 用于培训的面对面的数据集很难获得,而且容易出现隐私问题。 GANs 等基因模型生成的合成数据可能是一个很好的替代方法。 但是,我们表明,从GANs 生成的数据容易产生偏向和公平问题。 具体来说,接受FFHQ数据集培训的GAN显示20-29岁年龄组存在产生白脸的偏向性。 我们还表明,合成面孔在用于微调脸部验证系统时会产生不同的影响,特别是种族属性的影响。 使用$DoB ⁇ fv}$的衡量方法,该方法被定义为GAR@FAR 用于面部验证的标准偏差。