Nine language-vision AI models trained on web scrapes with the Contrastive Language-Image Pretraining (CLIP) objective are evaluated for evidence of a bias studied by psychologists: the sexual objectification of girls and women, which occurs when a person's human characteristics are disregarded and the person is treated as a body or a collection of body parts. A first experiment uses standardized images of women from the Sexual OBjectification and EMotion Database, and finds that, commensurate with prior research in psychology, human characteristics are disassociated from images of objectified women: the model's recognition of emotional state is mediated by whether the subject is fully or partially clothed. Embedding association tests (EATs) return significant effect sizes for both anger (d >.8) and sadness (d >.5). A second experiment measures the effect in a representative application: an automatic image captioner (Antarctic Captions) includes words denoting emotion less than 50% as often for images of partially clothed women than for images of fully clothed women. A third experiment finds that images of female professionals (scientists, doctors, executives) are likely to be associated with sexual descriptions relative to images of male professionals. A fourth experiment shows that a prompt of "a [age] year old girl" generates sexualized images (as determined by an NSFW classifier) up to 73% of the time for VQGAN-CLIP (age 17), and up to 40% of the time for Stable Diffusion (ages 14 and 18); the corresponding rate for boys never surpasses 9%. The evidence indicates that language-vision AI models trained on automatically collected web scrapes learn biases of sexual objectification, which propagate to downstream applications.
翻译:在网络剪辑中与 " 相悖语言图象前训练 " (CLIP)目标受过培训的9种语言的AI模型被评估,以证明心理学家研究的偏差:当一个人的个性特征被忽略,而一个人被当作身体或身体部分的集合物时,女童和妇女的性玩物化就会发生。第一次实验使用性触觉和运动数据库中妇女的标准图像,发现与先前的心理学研究相比,人类特征与被点化妇女的图像脱钩:模型对情感状态的认知是通过该主题是否完全或部分穿戴来调节的。嵌入协会测试(EATs)返回愤怒(d >.8)和悲伤(d>.5)的显著影响大小。第二个实验用具有代表性的应用程序衡量效果:自动图像说明器(Antarcic Captions)包含的字数比部分穿衣妇女图像的字数小于50 %,而完全穿衣妇女的形象则比对应的图像低。第三次实验发现,女性专业人员(科学家、医生、执行者、执行者、性器官分析员的直位比例为9年) 显示“性别图象学的老的DA 显示,这显示与性别图象学学学学学学学学前的老的图像是相关的。