Deep Learning systems need large data for training. Datasets for training face verification systems are difficult to obtain and prone to privacy issues. Synthetic data generated by generative models such as GANs can be a good alternative. However, we show that data generated from GANs are prone to bias and fairness issues. Specifically, GANs trained on FFHQ dataset show biased behavior towards generating white faces in the age group of 20-29. We also demonstrate that synthetic faces cause disparate impact, specifically for race attribute, when used for fine tuning face verification systems.
翻译:深层学习系统需要大量培训数据。培训的数据集面部核查系统难以获取,而且容易出现隐私问题。基因模型(如GANs)生成的合成数据可能是一个很好的替代方法。然而,我们表明,从GANs生成的数据容易产生偏向和公平问题。具体地说,接受FFHQ数据集培训的GAN显示20-29岁年龄组存在产生白脸的偏颇行为。我们还表明,合成面部在用于微调面部验证系统时会产生不同的影响,特别是种族属性的影响。