Deep forest is a non-differentiable deep model which has achieved impressive empirical success across a wide variety of applications, especially on categorical/symbolic or mixed modeling tasks. Many of the application fields prefer explainable models, such as random forests with feature contributions that can provide local explanation for each prediction, and Mean Decrease Impurity (MDI) that can provide global feature importance. However, deep forest, as a cascade of random forests, possesses interpretability only at the first layer. From the second layer on, many of the tree splits occur on the new features generated by the previous layer, which makes existing explanatory tools for random forests inapplicable. To disclose the impact of the original features in the deep layers, we design a calculation method with an estimation step followed by a calibration step for each layer, and propose our feature contribution and MDI feature importance calculation tools for deep forest. Experimental results on both simulated data and real world data verify the effectiveness of our methods.
翻译:-
通过特征贡献和MDI特征重要性解释深度森林
翻译后的摘要:
深度森林是一种非可微的深度模型,在许多应用领域,特别是分类/符号或混合建模任务中,取得了显著的实证成功。许多应用领域更喜欢可解释的模型,如可以为每个预测提供局部解释的特征贡献的随机森林,以及可以提供全局特征重要性的Mean Decrease Impurity(MDI)。但是,深度森林作为一个级联的随机森林,仅具有第一层的可解释性。从第二层开始,许多树分裂发生在上一层生成的新特征上,这使得现有的随机森林解释工具不适用于深度森林。为了揭示深层次中原始特征的影响,我们设计了一种计算方法,包括每个层的估计步骤和校准步骤,并提出了我们的特征贡献和MDI特征重要性计算工具。基于模拟数据和真实数据的实验结果验证了我们方法的有效性。