“我们无法衡量的,我们无法理解的” “追求公平对人口数据采购的挑战” ("What We Can't Measure, We Can't Understand": Challenges to Demographic Data Procurement in the Pursuit of Fairness)

As calls for fair and unbiased algorithmic systems increase, so too does the number of individuals working on algorithmic fairness in industry. However, these practitioners often do not have access to the demographic data they feel they need to detect bias in practice. Even with the growing variety of toolkits and strategies for working towards algorithmic fairness, they almost invariably require access to demographic attributes or proxies. We investigated this dilemma through semi-structured interviews with 38 practitioners and professionals either working in or adjacent to algorithmic fairness. Participants painted a complex picture of what demographic data availability and use look like on the ground, ranging from not having access to personal data of any kind to being legally required to collect and use demographic data for discrimination assessments. In many domains, demographic data collection raises a host of difficult questions, including how to balance privacy and fairness, how to define relevant social categories, how to ensure meaningful consent, and whether it is appropriate for private companies to infer someone's demographics. Our research suggests challenges that must be considered by businesses, regulators, researchers, and community groups in order to enable practitioners to address algorithmic bias in practice. Critically, we do not propose that the overall goal of future work should be to simply lower the barriers to collecting demographic data. Rather, our study surfaces a swath of normative questions about how, when, and whether this data should be procured, and, in cases where it is not, what should still be done to mitigate bias.

翻译：随着要求公平和公正算法系统的呼声增加,从事工业中算法公平工作的人数也有所增加。然而,这些从业者往往没有机会获得他们认为在实际中发现偏见所需要的人口数据。即使实现算法公平的工作的工具包和战略种类越来越多,他们几乎总是需要获得人口属性或代理人。我们通过与38名在算法公平中工作或与算法公平相邻的从业者和专业人员进行半结构性访谈来调查这一两难困境。与会者描绘了一个复杂的图象,即人口数据的提供和使用在实地看起来象什么,从无法获得任何类型的个人数据到依法需要收集和使用人口数据来进行歧视评估。在许多领域,人口数据收集提出了一系列困难的问题,包括如何平衡隐私和公平,如何界定相关的社会类别,如何确保有意义的同意,以及私营公司是否适宜于推断某人的人口统计数据。我们的研究表明,企业、监管者、研究人员和社区团体仍必须考虑哪些挑战,以便让从业者能够解决计算方法上的偏差。关键地,我们不建议,在收集未来数据时,在什么是地面数据时,在何种情况下,应该减少总的数据,在何种情况下,在何种情况下,在进行这样的问题上应该减少。