We dissect an experimental credit scoring model developed with real data and demonstrate -- without having access to protected attributes -- how the use of location information introduces racial bias. We analyze the tree gradient boosting model with the aid of a game-theoretic ML explainability technique, counterfactual experiments and Brazilian census data. The present experiment testifies to the importance of developing methods and language that goes beyond the need of access to protected attributes when auditing ML models, the necessity of considering regional specifics when reflecting on racial issues, and the importance of census data to the AI research community. To the best of our knowledge, this is the first documented case of how algorithmic racial bias may easily emerge in a ML credit scoring model built with Brazilian data, a country with the largest Black population outside Africa.
翻译:我们用真实数据解剖一个实验性信用评分模式,并展示 -- -- 在没有获得受保护的属性的情况下 -- -- 使用定位信息是如何引入种族偏见的。我们借助于游戏理论ML解释技术、反事实实验和巴西人口普查数据,分析了树梯度提振模式。目前的实验证明了制定方法和语言的重要性,这些方法和语言在审计ML模型时超出了获得受保护属性的需要,在考虑种族问题时必须考虑区域具体情况,以及普查数据对AI研究界的重要性。根据我们的知识,这是第一个记录下来的例子,说明用巴西数据构建的ML信用评分模式中,算法种族偏见如何容易出现,巴西是非洲以外黑人人口最多的国家。