Interpretable machine learning seeks to understand the reasoning process of complex black-box systems that are long notorious for lack of explainability. One flourishing approach is through counterfactual explanations, which provide suggestions on what a user can do to alter an outcome. Not only must a counterfactual example counter the original prediction from the black-box classifier but it should also satisfy various constraints for practical applications. Diversity is one of the critical constraints that however remains less discussed. While diverse counterfactuals are ideal, it is computationally challenging to simultaneously address some other constraints. Furthermore, there is a growing privacy concern over the released counterfactual data. To this end, we propose a feature-based learning framework that effectively handles the counterfactual constraints and contributes itself to the limited pool of private explanation models. We demonstrate the flexibility and effectiveness of our method in generating diverse counterfactuals of actionability and plausibility. Our counterfactual engine is more efficient than counterparts of the same capacity while yielding the lowest re-identification risks.
翻译:可解释的机器学习试图了解因缺乏解释而长期臭名昭著的复杂黑盒系统的推理过程。一种蓬勃发展的方法是反事实解释,它为用户能够改变结果提供了建议。不仅必须用反事实例子来反驳黑盒分类器最初的预测,而且还应满足实际应用方面的各种限制。多样性是关键制约因素之一,但讨论却较少。虽然不同的反事实是理想的,但同时处理其他一些限制因素却具有计算上的挑战性。此外,对公布的反事实数据的隐私问题日益引起关注。为此,我们提出了一个基于特征的学习框架,有效地处理反事实限制,并为有限的私人解释模型作出贡献。我们展示了我们方法的灵活性和有效性,以产生不同的可操作性和可信赖性反事实。我们的反事实引擎比同一能力的对应引擎效率更高,同时产生最低的再识别风险。