SHAP (SHapley Additive exPlanation) values are one of the leading tools for interpreting machine learning models, with strong theoretical guarantees (consistency, local accuracy) and a wide availability of implementations and use cases. Even though computing SHAP values takes exponential time in general, TreeSHAP takes polynomial time on tree-based models. While the speedup is significant, TreeSHAP can still dominate the computation time of industry-level machine learning solutions on datasets with millions or more entries, causing delays in post-hoc model diagnosis and interpretation service. In this paper we present two new algorithms, Fast TreeSHAP v1 and v2, designed to improve the computational efficiency of TreeSHAP for large datasets. We empirically find that Fast TreeSHAP v1 is 1.5x faster than TreeSHAP while keeping the memory cost unchanged. Similarly, Fast TreeSHAP v2 is 2.5x faster than TreeSHAP, at the cost of a slightly higher memory usage, thanks to the pre-computation of expensive TreeSHAP steps. We also show that Fast TreeSHAP v2 is well-suited for multi-time model interpretations, resulting in as high as 3x faster explanation of newly incoming samples.
翻译:SHAP (SHapley Additive Explanation) 值是解释机器学习模型的主要工具之一,具有很强的理论保障( 一致性、 本地精度), 并广泛提供实施和使用案例。 尽管计算 SHAP 值需要一般的指数时间, TreaSHAP 在树基模型上需要多角度的时间。 虽然速度相当可观, TreesSHAP 仍然可以控制数以百万计或数以百万计以上的数据集的行业级机器学习解决方案的计算时间, 从而在热量模型后诊断和解释服务方面造成延误。 在本文中, 我们提出了两种新的算法, 快速树SHAP v1 和 v2, 旨在提高大数据集的树SHAP 计算效率。 我们从经验中发现, 快速树SHAP v1 在保持记忆成本不变的同时, 快速树SHAP v2 的计算时间比 TreeSHAP v2 高出2.5x,, 成本略微高一点的记忆使用, 多一点, 多一点, 感谢使用成本, 多处的预算出昂贵的树SHAP 模型步骤。 我们还显示, 快速的快速解释是新进入的快速的高度, 。