Counterfactual explanations are one of the most popular methods to make predictions of black box machine learning models interpretable by providing explanations in the form of `what-if scenarios'. Most current approaches optimize a collapsed, weighted sum of multiple objectives, which are naturally difficult to balance a-priori. We propose the Multi-Objective Counterfactuals (MOC) method, which translates the counterfactual search into a multi-objective optimization problem. Our approach not only returns a diverse set of counterfactuals with different trade-offs between the proposed objectives, but also maintains diversity in feature space. This enables a more detailed post-hoc analysis to facilitate better understanding and also more options for actionable user responses to change the predicted outcome. Our approach is also model-agnostic and works for numerical and categorical input features. We show the usefulness of MOC in concrete cases and compare our approach with state-of-the-art methods for counterfactual explanations.
翻译:反事实解释是最受欢迎的方法之一,通过以“假想情况”的形式提供解释,对黑盒机器学习模型作出预测,从而解释黑盒机器学习模型。大多数现行方法优化了一个崩溃的加权的多重目标的总和,这自然是难以平衡一个首要目标的。我们建议采用多目标反事实分析方法,将反事实研究转化为一个多目标优化问题。我们的方法不仅返回一套不同的反事实,在拟议目标之间取舍不同,而且还维持地貌空间的多样性。这样,就可以进行更详尽的热后分析,促进更好的理解,并为用户改变预期结果的可采取行动的对策提供更多选项。我们的方法也是模型的认知性,并用于数字和绝对输入特征。我们展示了反事实研究方法在具体案例中的效用,并将我们的方法与最先进的反事实解释方法进行比较。