稳定偏差:分析扩散模型中的社会表征 (Stable Bias: Analyzing Societal Representations in Diffusion Models)

As machine learning-enabled Text-to-Image (TTI) systems are becoming increasingly prevalent and seeing growing adoption as commercial services, characterizing the social biases they exhibit is a necessary first step to lowering their risk of discriminatory outcomes. This evaluation, however, is made more difficult by the synthetic nature of these systems' outputs; since artificial depictions of fictive humans have no inherent gender or ethnicity nor do they belong to socially-constructed groups, we need to look beyond common categorizations of diversity or representation. To address this need, we propose a new method for exploring and quantifying social biases in TTI systems by directly comparing collections of generated images designed to showcase a system's variation across social attributes -- gender and ethnicity -- and target attributes for bias evaluation -- professions and gender-coded adjectives. Our approach allows us to (i) identify specific bias trends through visualization tools, (ii) provide targeted scores to directly compare models in terms of diversity and representation, and (iii) jointly model interdependent social variables to support a multidimensional analysis. We use this approach to analyze over 96,000 images generated by 3 popular TTI systems (DALL-E 2, Stable Diffusion v 1.4 and v 2) and find that all three significantly over-represent the portion of their latent space associated with whiteness and masculinity across target attributes; among the systems studied, DALL-E 2 shows the least diversity, followed by Stable Diffusion v2 then v1.4.

翻译：随着机器学习支持的文本到图像(TTI)系统越来越普及，并得到商业服务的日益采用，表征它们所展现出的社会偏见是降低其歧视性结果风险的必要第一步。然而，由于这些系统输出的是合成图像，它们对应的社会偏见分析变得更加困难。因为人工描绘的虚构人物在天性上没有性别或种族，也没有属于社会建构群体，我们需要超越常见的多样性或表征分类。为了解决这个问题，我们提出了一种新方法，通过直接比较旨在展示系统跨社会属性（性别和种族）的生成图像集和针对偏见评估的目标属性（职业和性别编码形容词），来探索和量化TTI系统中的社会偏见。我们的方法允许我们通过可视化工具(1)识别具体的偏见趋势，(2)提供有针对性的评分以直接比较模型的多样性和表征，(3)共同建模相互依存的社会变量以支持多维分析。我们使用这种方法分析了3种流行的TTI系统（DALL-E 2、Stable Diffusion v1.4和v2）生成的超过96,000张图像，发现所有三种系统在目标属性上显著地过度展示与白人和男性相关的潜在空间部分；在研究的系统中，DALL-E 2表现出最少的多样性，其次是Stable Diffusion v2，然后是v1.4。