Counterfactual explanation is an important Explainable AI technique to explain machine learning predictions. Despite being studied actively, existing optimization-based methods often assume that the underlying machine-learning model is differentiable and treat categorical attributes as continuous ones, which restricts their real-world applications when categorical attributes have many different values or the model is non-differentiable. To make counterfactual explanation suitable for real-world applications, we propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE), which adopts a newly designed pipeline that can efficiently handle non-differentiable machine-learning models on a large number of feature values. in our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity. Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
翻译:反事实解释是解释机器学习预测的一个重要的人工智能技术。 尽管正在积极研究,但现有的优化方法往往假定基础机器学习模式是不同的,并将绝对属性视为连续性的,当绝对属性具有许多不同价值或该模型是非区别的时,这限制了其真实世界应用。为了使反事实解释适合真实世界应用,我们提议了一个新型的模型 -- -- 不可否认的反事实解释框架,采用新设计的管道,能够有效地处理大量特征值的不可区分的机器学习模式。 在我们的MACE 方法中,我们提出了一种新的基于RL的方法,用于寻找良好的反事实实例,以及一种提高距离的无梯度脱落方法。关于公共数据集的实验以更好的有效性、宽度和距离来验证其有效性。