The original development of Shapley values for prediction explanation relied on the assumption that the features being described were independent. If the features in reality are dependent this may lead to incorrect explanations. Hence, there have recently been attempts of appropriately modelling/estimating the dependence between the features. Although the proposed methods clearly outperform the traditional approach assuming independence, they have their weaknesses. In this paper we propose two new approaches for modelling the dependence between the features. Both approaches are based on vine copulas, which are flexible tools for modelling multivariate non-Gaussian distributions able to characterise a wide range of complex dependencies. The performance of the proposed methods is evaluated on simulated data sets and a real data set. The experiments demonstrate that the vine copula approaches give more accurate approximations to the true Shapley values than its competitors.
翻译:用于预测解释的Shapley值的最初发展所依据的假设是:所描述的特征是独立的;如果现实中的特征取决于这些特征,则可能导致不正确的解释;因此,最近曾尝试适当建模/估计这些特征之间的依赖性;虽然拟议的方法明显优于假定独立的传统方法,但它们有其弱点;在本文件中,我们提出了两种建模这些特征之间依赖性的新方法;这两种方法都以藤 ⁇ 为基础,这些是模拟多种变式非加苏西人分布的灵活工具,能够描述广泛的复杂依赖性;拟议方法的性能根据模拟数据集和真实数据集加以评估;实验表明,藤 ⁇ 与竞争者相比,对真正的沙普里价值的近似更为准确。