Counterfactual explanation methods interpret the outputs of a machine learning model in the form of "what-if scenarios" without compromising the fidelity-interpretability trade-off. They explain how to obtain a desired prediction from the model by recommending small changes to the input features, aka recourse. We believe an actionable recourse should be created based on sound counterfactual explanations originating from the distribution of the ground-truth data and linked to the domain knowledge. Moreover, it needs to preserve the coherency between changed/unchanged features while satisfying user/domain-specified constraints. This paper introduces CARE, a modular explanation framework that addresses the model- and user-level desiderata in a consecutive and structured manner. We tackle the existing requirements by proposing novel and efficient solutions that are formulated in a multi-objective optimization framework. The designed framework enables including arbitrary requirements and generating counterfactual explanations and actionable recourse by choice. As a model-agnostic approach, CARE generates multiple, diverse explanations for any black-box model in tabular classification and regression settings. Several experiments on standard data sets and black-box models demonstrate the effectiveness of our modular framework and its superior performance compared to the baselines.
翻译:反事实解释方法将机器学习模型的输出解释为“如果情况变化”而不损害忠诚解释的权衡,解释如何通过建议对输入特征进行小改动来从模型中获得理想的预测, aka 追索。我们认为,应当根据根据地面数据分布和与域知识相联系的可靠反事实解释,建立可诉的追索方法。此外,它需要保持改变/未改变的特征之间的一致性,同时满足用户/域指定的限制。本文介绍了CARE,这是一个模块化解释框架,以连续和结构化的方式处理模型和用户一级的偏差。我们通过提出在多目标优化框架内拟订的新颖和有效的解决方案来解决现有要求。设计框架可以包括任意要求,产生反事实解释和选择可诉的追索。作为示范-不可理学方法,CARE在表格分类和回归设置中对任何黑盒模型作出多种不同的解释。关于标准数据集和黑盒模型的若干实验和黑盒模型展示了我们模块框架的有效性及其与基线相比的优劣性表现。