Originating from cooperative game theory, Shapley values have become one of the most widely used measures for variable importance in applied Machine Learning. However, the statistical understanding of Shapley values is still limited. In this paper, we take a nonparametric (or smoothing) perspective by introducing Shapley curves as a local measure of variable importance. We propose two estimation strategies and derive the consistency and asymptotic normality both under independence and dependence among the features. This allows us to construct confidence intervals and conduct inference on the estimated Shapley curves. The asymptotic results are validated in extensive experiments. In an empirical application, we analyze which attributes drive the prices of vehicles.
翻译:Shapley值来源于合作游戏理论,它已成为应用机器学习不同重要性最广泛使用的措施之一,然而,对Shapley值的统计理解仍然有限。在本文中,我们从非参数(或平滑)的角度出发,采用Shapley曲线作为具有不同重要性的地方尺度。我们提出两项估算战略,并得出在独立和依赖性下两种特征的一致性和无症状常性。这使我们能够建立信任间隔,对估计的Shapley曲线进行推断。无药可乐结果在广泛的实验中得到验证。在经验应用中,我们分析车辆价格的属性。