Originating from cooperative game theory, Shapley values have become one of the most widely used measures for variable importance in applied Machine Learning. However, the statistical understanding of Shapley values is still limited. In this paper, we take a nonparametric (or smoothing) perspective by introducing Shapley curves as a local measure of variable importance. We consider two estimation strategies and derive the consistency and asymptotic normality both under independence and dependence among the features. We further propose a novel version of the wild bootstrap procedure specifically adjusted for Shapley curves. This allows us to construct confidence intervals and conduct inference. The asymptotic results are validated in extensive experiments. In an empirical application, we analyze which attributes drive the prices of vehicles.
翻译:暂无翻译