Machine learning models have been shown to inherit biases from their training datasets, which can be particularly problematic for vision-language foundation models trained on uncurated datasets scraped from the internet. The biases can be amplified and propagated to downstream applications like zero-shot classifiers and text-to-image generative models. In this study, we propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding. In particular, we show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models. The closed-form solution enables easy integration into large-scale pipelines, and empirical results demonstrate that our approach effectively reduces social bias and spurious correlation in both discriminative and generative vision-language models without the need for additional data or training.
翻译:机器学习模式被证明继承了培训数据集的偏差,这对在互联网上剪切的未精确数据集培训的视觉语言基础模型来说可能特别成问题。这种偏差可以扩大和传播到下游应用,例如零光分类和文字到图像的基因化模型。在这项研究中,我们提出了通过预测文字嵌入的偏差方向来贬低视觉语言基础模型的一般方法。特别是,我们表明,只有用校准的预测矩阵嵌入文字才具有偏差性,足以产生强大的分类器和公平的基因化模型。封闭式解决方案可以容易地融入大型管道,经验结果表明,我们的方法有效地减少了在歧视性和基因化的视觉语言模型中的社会偏差和虚假相关性,而不需要额外的数据或培训。