The aim of this study is to define importance of predictors for black box machine learning methods, where the prediction function can be complex and cannot be represented by statistical parameters. In this paper we defined a ``Generalized Variable Importance Metric (GVIM)'' using the true conditional expectation function for a continuous or a binary response variable. We further showed that the defined GVIM can be represented as a function of the Conditional Average Treatment Effect (CATE) for multinomial and continuous predictors. Then we propose how the metric can be estimated using using any machine learning models. Finally using simulations we evaluated the properties of the estimator when estimated from XGBoost, Random Forest and a mis-specified generalized additive model.
翻译:暂无翻译