Over the past few years, the use of machine learning models has emerged as a generic and powerful means for prediction purposes. At the same time, there is a growing demand for interpretability of prediction models. To determine which features of a dataset are important to predict a target variable $Y$, a Feature Importance (FI) method can be used. By quantifying how important each feature is for predicting $Y$, irrelevant features can be identified and removed, which could increase the speed and accuracy of a model, and moreover, important features can be discovered, which could lead to valuable insights. A major problem with evaluating FI methods, is that the ground truth FI is often unknown. As a consequence, existing FI methods do not give the exact correct FI values. This is one of the many reasons why it can be hard to properly interpret the results of an FI method. Motivated by this, we introduce a new global approach named the Berkelmans-Pries FI method, which is based on a combination of Shapley values and the Berkelmans-Pries dependency function. We prove that our method has many useful properties, and accurately predicts the correct FI values for several cases where the ground truth FI can be derived in an exact manner. We experimentally show for a large collection of FI methods (468) that existing methods do not have the same useful properties. This shows that the Berkelmans-Pries FI method is a highly valuable tool for analyzing datasets with complex interdependencies.
翻译:在过去几年里,机器学习模型的使用已成为一种通用和有力的预测手段。同时,对预测模型解释性的需求日益增长。为了确定数据集的哪些特征对于预测一个目标变量Y$很重要,可以使用一种功能重要性(FI)方法。通过量化每个功能对于预测美元的重要性,可以确定和删除不相干的特点,这可以提高模型的速度和准确性,此外,可以发现一些重要的特征,这可能导致有价值的洞察力。评价FI方法的一个主要问题是,实地的真相FI往往不为人所知。结果之一是,现有的FI方法并不提供准确的FI值。这是很难正确解释FI方法结果的许多原因之一。受此影响,我们引入了一种名为Berkelmans-Pries FI方法的新的全球方法,这种方法可以提高模型的速度和准确性能,它可以导致宝贵的洞察力值。我们证明,我们的方法具有许多有用的性能,因此FIFI的计算方法不能准确提供准确的准确的FIFI的数值。我们精确地预测了BER方法的收集。