Large datasets underlying much of current machine learning raise serious issues concerning inappropriate content such as offensive, insulting, threatening, or might otherwise cause anxiety. This calls for increased dataset documentation, e.g., using datasheets. They, among other topics, encourage to reflect on the composition of the datasets. So far, this documentation, however, is done manually and therefore can be tedious and error-prone, especially for large image datasets. Here we ask the arguably "circular" question of whether a machine can help us reflect on inappropriate content, answering Question 16 in Datasheets. To this end, we propose to use the information stored in pre-trained transformer models to assist us in the documentation process. Specifically, prompt-tuning based on a dataset of socio-moral values steers CLIP to identify potentially inappropriate content, therefore reducing human labor. We then document the inappropriate images found using word clouds, based on captions generated using a vision-language model. The documentations of two popular, large-scale computer vision datasets -- ImageNet and OpenImages -- produced this way suggest that machines can indeed help dataset creators to answer Question 16 on inappropriate image content.
翻译:目前许多机器学习背后的大型数据集提出了与不适当内容有关的严重问题,如冒犯、侮辱、威胁或可能引发焦虑。这要求增加数据集文档,例如使用数据表。除其他外,它们鼓励对数据集的构成进行反思。但迄今为止,这种文档是手工制作的,因此可能枯燥和容易出错,特别是大型图像数据集。我们在这里询问一个机器是否可以帮助我们思考不适当内容的“循环”问题,即一个机器能否帮助我们思考不适当内容,在数据表中回答问题16。为此,我们提议使用预先训练的变异器模型中储存的信息来协助我们进行文档进程。具体地说,根据社会道德值数据集进行快速调整,引导CLIP识别潜在不适当的内容,从而减少人类劳动。我们随后根据使用视觉语言模型生成的字幕记录了用文字云发现的不适当的图像。两个广尺度的计算机视觉数据集 -- 图像网和 Openimimages -- 生成了这个方法,表明机器确实可以帮助建立图像的答案。