This work addresses fair generative models. Dataset biases have been a major cause of unfairness in deep generative models. Previous work had proposed to augment large, biased datasets with small, unbiased reference datasets. Under this setup, a weakly-supervised approach has been proposed, which achieves state-of-the-art quality and fairness in generated samples. In our work, based on this setup, we propose a simple yet effective approach. Specifically, first, we propose fairTL, a transfer learning approach to learn fair generative models. Under fairTL, we pre-train the generative model with the available large, biased datasets and subsequently adapt the model using the small, unbiased reference dataset. We find that our fairTL can learn expressive sample generation during pre-training, thanks to the large (biased) dataset. This knowledge is then transferred to the target model during adaptation, which also learns to capture the underlying fair distribution of the small reference dataset. Second, we propose fairTL++, where we introduce two additional innovations to improve upon fairTL: (i) multiple feedback and (ii) Linear-Probing followed by Fine-Tuning (LP-FT). Taking one step further, we consider an alternative, challenging setup when only a pre-trained (potentially biased) model is available but the dataset that was used to pre-train the model is inaccessible. We demonstrate that our proposed fairTL and fairTL++ remain very effective under this setup. We note that previous work requires access to the large, biased datasets and is incapable of handling this more challenging setup. Extensive experiments show that fairTL and fairTL++ achieve state-of-the-art in both quality and fairness of generated samples. The code and additional resources can be found at bearwithchris.github.io/fairTL/.
翻译:这项工作针对的是公平的基因模型。 数据集偏差是深层基因模型中不公平的一个主要原因。 先前的工作曾提议用小型、 不带偏见的参考数据集来扩大大型、 有偏差的数据集。 在此设置下, 提出了一种薄弱监督的方法, 从而在生成的样本中实现最先进的质量和公平性。 我们在工作中, 以这一设置为基础, 提出了一个简单而有效的方法。 具体地说, 我们提出公平TL, 一种转移学习方法, 以学习公平的基因模型。 在公平TLL下, 我们用现有的大、 有偏差的数据集来预设基因模型, 并随后用小、 公正的参考数据集来调整模型。 我们发现, 公平LL, 公平T 并处理公平T++, 我们提出两项额外的创新, 改进公平TL: (i) 多重反馈, 在培训前阶段里, 启动一个系统里的数据序列里, 继续使用一个系统里。 (Pro- prefreal- train serate)