Dikaios:通过属性推断攻击对数值公平性进行隐私审计 (Dikaios: Privacy Auditing of Algorithmic Fairness via Attribute Inference Attacks)

Machine learning (ML) models have been deployed for high-stakes applications. Due to class imbalance in the sensitive attribute observed in the datasets, ML models are unfair on minority subgroups identified by a sensitive attribute, such as race and sex. In-processing fairness algorithms ensure model predictions are independent of sensitive attribute. Furthermore, ML models are vulnerable to attribute inference attacks where an adversary can identify the values of sensitive attribute by exploiting their distinguishable model predictions. Despite privacy and fairness being important pillars of trustworthy ML, the privacy risk introduced by fairness algorithms with respect to attribute leakage has not been studied. We identify attribute inference attacks as an effective measure for auditing blackbox fairness algorithms to enable model builder to account for privacy and fairness in the model design. We proposed Dikaios, a privacy auditing tool for fairness algorithms for model builders which leveraged a new effective attribute inference attack that account for the class imbalance in sensitive attributes through an adaptive prediction threshold. We evaluated Dikaios to perform a privacy audit of two in-processing fairness algorithms over five datasets. We show that our attribute inference attacks with adaptive prediction threshold significantly outperform prior attacks. We highlighted the limitations of in-processing fairness algorithms to ensure indistinguishable predictions across different values of sensitive attributes. Indeed, the attribute privacy risk of these in-processing fairness schemes is highly variable according to the proportion of the sensitive attributes in the dataset. This unpredictable effect of fairness mechanisms on the attribute privacy risk is an important limitation on their utilization which has to be accounted by the model builder.

翻译：机器学习(ML) 模型已经用于高取量应用。由于在数据集中观察到的敏感属性的等级不平衡, ML 模型对种族和性别等敏感属性确定的少数群体分组不公平。处理中的公平算法确保模型预测独立于敏感属性。此外, ML 模型容易被归为推论攻击,因为对手可以利用其可辨别的模型预测来识别敏感属性的值。尽管隐私和公平是可信赖的ML的重要支柱,但是没有研究在属性渗漏方面的公平算法带来的隐私风险。我们确定属性推断攻击是审计黑盒公平算法的有效措施,以使模型构建者能够在模型设计中说明隐私和公平性。我们提议Dikaios 模型模型容易被推导出敏感属性攻击的新的有效属性攻击。我们评估Dikaios 对五个数据集的处理公平算法进行隐私审计,在五个数据集的处理中对两种公平性算法进行保密性分析。我们显示,在模型设计中,这种可辨误判的属性比值是前期预测值中,这种可辨性分析的准确性比值是显著的。