Biases in models pose a critical issue when deploying machine learning systems, but diagnosing them in an explainable manner can be challenging. To address this, we introduce the bias-to-text (B2T) framework, which uses language interpretation to identify and mitigate biases in vision models, such as image classifiers and text-to-image generative models. Our language descriptions of visual biases provide explainable forms that enable the discovery of novel biases and effective model debiasing. To achieve this, we analyze common keywords in the captions of mispredicted or generated images. Here, we propose novel score functions to avoid biases in captions by comparing the similarities between bias keywords and those images. Additionally, we present strategies to debias zero-shot classifiers and text-to-image diffusion models using the bias keywords from the B2T framework. We demonstrate the effectiveness of our framework on various image classification and generation tasks. For classifiers, we discover a new spurious correlation between the keywords "(sports) player" and "female" in Kaggle Face and improve the worst-group accuracy on Waterbirds by 11% through debiasing, compared to the baseline. For generative models, we detect and effectively prevent unfair (e.g., gender-biased) and unsafe (e.g., "naked") image generation.
翻译:暂无翻译