Most proposed algorithmic fairness techniques require access to data on a "sensitive attribute" or "protected category" (such as race, ethnicity, gender, or sexuality) in order to make performance comparisons and standardizations across groups, however this data is largely unavailable in practice, hindering the widespread adoption of algorithmic fairness. Through this paper, we consider calls to collect more data on demographics to enable algorithmic fairness and challenge the notion that discrimination can be overcome with smart enough technical methods and sufficient data alone. We show how these techniques largely ignore broader questions of data governance and systemic oppression when categorizing individuals for the purpose of fairer algorithmic processing. In this work, we explore under what conditions demographic data should be collected and used to enable algorithmic fairness methods by characterizing a range of social risks to individuals and communities. For the risks to individuals we consider the unique privacy risks associated with the sharing of sensitive attributes likely to be the target of fairness analysis, the possible harms stemming from miscategorizing and misrepresenting individuals in the data collection process, and the use of sensitive data beyond data subjects' expectations. Looking more broadly, the risks to entire groups and communities include the expansion of surveillance infrastructure in the name of fairness, misrepresenting and mischaracterizing what it means to be part of a demographic group or to hold a certain identity, and ceding the ability to define for themselves what constitutes biased or unfair treatment. We argue that, by confronting these questions before and during the collection of demographic data, algorithmic fairness methods are more likely to actually mitigate harmful treatment disparities without reinforcing systems of oppression.
翻译:多数拟议的算法公平技术要求获得“敏感属性”或“受保护类别”(如种族、族裔、性别或性)的数据,以便进行跨群体的业绩比较和标准化,然而,这一数据在实践中基本上缺乏,妨碍了广泛采用算法公平。我们通过本文件,考虑呼吁收集更多的人口统计数据,以便能够实现算法公平,并质疑以下概念:在为更公平的算法处理工作的目的对个人进行分类时,歧视可以通过智能技术方法和足够数据来克服;我们显示这些技术如何在将个人分类时,在很大程度上忽视数据公平管理和系统压迫等更广泛的问题;在这项工作中,我们探索在哪些条件下,应该收集和使用人口数据,以便通过确定个人和社区面临的一系列社会风险,使算法公平化方法得以实现。对于个人来说,我们认为,与分享可能作为公平分析目标的敏感属性有关的独特隐私风险,在数据收集过程中对个人进行错误的分类和误解,以及使用超出数据主体期望的敏感数据。在更广义上,整个群体和社区面临的风险包括:在不全面分析之前,在不收集人口数据之前,如何扩大对等的处理方法,从而使某些数据的准确性更准确地界定人口分析方法。