We survey over 100 face datasets constructed between 1976 to 2019 of 145 million images of over 17 million subjects from a range of sources, demographics and conditions. Our historical survey reveals that these datasets are contextually informed, shaped by changes in political motivations, technological capability and current norms. We discuss how such influences mask specific practices (some of which may actually be harmful or otherwise problematic) and make a case for the explicit communication of such details in order to establish a more grounded understanding of the technology's function in the real world.
翻译:我们从1976年到2019年共建立了100多个面对面的数据集,由来自各种来源、人口和条件的1 700多万个主题的1.45亿张图像组成。我们的历史调查显示,这些数据集根据具体情况了解情况,受政治动机、技术能力和现行规范变化的影响。我们讨论了这些影响如何掩盖具体做法(其中一些做法实际上可能有害或有其他问题),并论证如何明确传达这些细节,以便更深入地了解技术在现实世界中的功能。