Interpretable machine learning has become a very active area of research due to the rising popularity of machine learning algorithms and their inherently challenging interpretability. Most work in this area has been focused on the interpretation of single features in a model. However, for researchers and practitioners, it is often equally important to quantify the importance or visualize the effect of feature groups. To address this research gap, we provide a comprehensive overview of how existing model-agnostic techniques can be defined for feature groups to assess the grouped feature importance, focusing on permutation-based, refitting, and Shapley-based methods. We also introduce an importance-based sequential procedure that identifies a stable and well-performing combination of features in the grouped feature space. Furthermore, we introduce the combined features effect plot, which is a technique to visualize the effect of a group of features based on a sparse, interpretable linear combination of features. We used simulation studies and a real data example from computational psychology to analyze, compare, and discuss these methods.
翻译:由于机器学习算法越来越受人欢迎,而且其内在解释性也具有挑战性,解释性机器学习已成为一个非常活跃的研究领域。这一领域的大部分工作都集中在对模型中单一特征的解释上。然而,对于研究人员和从业人员来说,用数量表示特征组的重要性或可视化其影响往往同样重要。为了解决这一研究差距,我们全面概述了如何为特征组界定现有的模型-不可知性技术,以评估群集特征的重要性,重点是基于变换、改编和基于毛质的方法。我们还采用了基于重要性的相继程序,确定组合特征空间中各种特征的稳定和良好组合。此外,我们引入了组合特征效应图,这是一种根据零散、可解释的线性组合对一组特征的效应进行直观直观描述的技术。我们利用模拟研究和从计算心理学中得出的真实数据实例来分析、比较和讨论这些方法。