Black-box machine learning models are being used in more and more high-stakes domains, which creates a growing need for Explainable AI (XAI). Unfortunately, the use of XAI in machine learning introduces new privacy risks, which currently remain largely unnoticed. We introduce the explanation linkage attack, which can occur when deploying instance-based strategies to find counterfactual explanations. To counter such an attack, we propose k-anonymous counterfactual explanations and introduce pureness as a new metric to evaluate the validity of these k-anonymous counterfactual explanations. Our results show that making the explanations, rather than the whole dataset, k- anonymous, is beneficial for the quality of the explanations.
翻译:黑盒机器学习模式正在越来越多的高空领域使用,这导致越来越需要可解释的AI(XAI ) 。 不幸的是,在机器学习中使用 XAI 带来了新的隐私风险,而目前这种风险基本上仍然得不到注意。 我们引入了解释关联攻击,这在采用以实例为基础的战略寻找反事实解释时可能发生。 为了应对这种攻击,我们提出了k-匿名反事实解释,并引入纯度作为评估这些k-匿名反事实解释有效性的新指标。 我们的结果表明,作出解释而不是整个数据集(k-匿名)对于解释的质量是有好处的。