Evidence shows that text-to-image (T2I) models disproportionately reflect Western cultural norms, amplifying misrepresentation and harms to minority groups. However, evaluating cultural sensitivity is inherently complex due to its fluid and multifaceted nature. This paper draws on a state-of-the-art review and co-creation workshops involving 59 individuals from 19 different countries. We developed and validated a mixed-methods community-based evaluation methodology to assess cultural sensitivity in T2I models, which embraces first-person methods. Quantitative scores and qualitative inquiries expose convergence and disagreement within and across communities, illuminate the downstream consequences of misrepresentation, and trace how training data shaped by unequal power relations distort depictions. Extensive assessments are constrained by high resource requirements and the dynamic nature of culture, a tension we alleviate through a context-based and iterative methodology. The paper provides actionable recommendations for stakeholders, highlighting pathways to investigate the sources, mechanisms, and impacts of cultural (mis)representation in T2I models.
翻译:证据表明,文本到图像(T2I)模型不成比例地反映了西方文化规范,加剧了对少数群体的误表与伤害。然而,由于文化敏感性具有流动性和多面性,其评估本质上是复杂的。本文基于一项前沿综述和涉及来自19个不同国家的59名参与者的共创工作坊,开发并验证了一种基于社群的混合方法评估框架,用于评估T2I模型的文化敏感性,该方法融合了第一人称视角。定量评分与定性探究揭示了社群内部及跨社群的共识与分歧,阐明了误表的下游后果,并追溯了由不平等权力关系塑造的训练数据如何扭曲图像描绘。尽管广泛评估受限于高资源需求和文化的动态性,我们通过一种基于情境的迭代方法缓解了这一矛盾。本文为利益相关者提供了可操作的建议,重点指出了探究T2I模型中文化(误)表的来源、机制与影响的路径。