Counterfactuals, serving as one of the emerging type of model interpretations, have recently received attention from both researchers and practitioners. Counterfactual explanations formalize the exploration of ``what-if'' scenarios, and are an instance of example-based reasoning using a set of hypothetical data samples. Counterfactuals essentially show how the model decision alters with input perturbations. Existing methods for generating counterfactuals are mainly algorithm-based, which are time-inefficient and assume the same counterfactual universe for different queries. To address these limitations, we propose a Model-based Counterfactual Synthesizer (MCS) framework for interpreting machine learning models. We first analyze the model-based counterfactual process and construct a base synthesizer using a conditional generative adversarial net (CGAN). To better approximate the counterfactual universe for those rare queries, we novelly employ the umbrella sampling technique to conduct the MCS framework training. Besides, we also enhance the MCS framework by incorporating the causal dependence among attributes with model inductive bias, and validate its design correctness from the causality identification perspective. Experimental results on several datasets demonstrate the effectiveness as well as efficiency of our proposed MCS framework, and verify the advantages compared with other alternatives.
翻译:反事实现象是新出现的模型解释类型之一,最近引起了研究人员和从业者的关注。反事实解释正式确定了对“何-假”假设情景的探索,是使用一套假设数据样本进行基于实例的推理的事例。反事实基本上表明模型决定如何随着输入干扰而改变。现有的反事实方法主要是基于算法的,这些算法是时间效率低的,对不同的查询也具有同样的反事实范围。为了解决这些限制,我们提议了一个基于模型的反事实合成器(MCS)框架,用于解释机器学习模型。我们首先分析基于模型的反事实过程,并利用一个有条件的配对对抗网(CGAN)建立一个基础合成器。为了更好地将这些稀有的查询的反事实范围接近,我们新奇地采用总括抽样技术来进行监控监框架培训。此外,我们还加强监控监框架,将各种属性的因果关系与诱导偏差结合起来,并从因果关系角度来验证其设计是否正确。我们首先分析基于模型的反事实过程过程,然后用一些实验性结果来验证我们提出的替代方法的有效性。