Counterfactual explanations utilize feature perturbations to analyze the outcome of an original decision and recommend an actionable recourse. We argue that it is beneficial to provide several alternative explanations rather than a single point solution and propose a probabilistic paradigm to estimate a diverse set of counterfactuals. Specifically, we treat the perturbations as random variables endowed with prior distribution functions. This allows sampling multiple counterfactuals from the posterior density, with the added benefit of incorporating inductive biases, preserving domain specific constraints and quantifying uncertainty in estimates. More importantly, we leverage Bayesian hierarchical modeling to share information across different subgroups of a population, which can both improve robustness and measure fairness. A gradient based sampler with superior convergence characteristics efficiently computes the posterior samples. Experiments across several datasets demonstrate that the counterfactuals estimated using our approach are valid, sparse, diverse and feasible.
翻译:反事实解释利用特征扰动来分析最初决定的结果,并建议可采取行动的追索。我们争辩说,提供几种替代解释而不是单一的点解决办法是有益的,并且提出一种概率模型来估计一套不同的反事实。具体地说,我们把扰动作为随机变量来处理,赋予先前分布功能。这允许对后方密度的多重反事实进行取样,其额外好处是纳入感应偏差、保留特定领域的限制和对估计数的不确定性进行量化。更重要的是,我们利用贝叶斯等级模型在人口的不同分组之间分享信息,这既能提高稳健性,又能衡量公平性。基于梯度的采样器具有高度趋同性,能有效地对后方样品进行拼凑。通过几个数据集的实验表明,使用我们的方法估计的反事实是有效、稀少、多样和可行的。