Machine learning models are becoming increasingly popular in different types of settings. This is mainly caused by their ability to achieve a level of predictive performance that is hard to match by human experts in this new era of big data. With this usage growth comes an increase of the requirements for accountability and understanding of the models' predictions. However, the degree of sophistication of the most successful models (e.g. ensembles, deep learning) is becoming a large obstacle to this endeavour as these models are essentially black boxes. In this paper we describe two general approaches that can be used to provide interpretable descriptions of the expected performance of any black box classification model. These approaches are of high practical relevance as they provide means to uncover and describe in an interpretable way situations where the models are expected to have a performance that deviates significantly from their average behaviour. This may be of critical relevance for applications where costly decisions are driven by the predictions of the models, as it can be used to warn end users against the usage of the models in some specific cases.
翻译:机械学习模型在不同类型的环境中越来越受欢迎,这主要是由于它们能够达到在大数据新时代人类专家难以匹配的预测性能水平。随着这种使用的增长,要求问责制和理解模型预测的要求也随之增加。然而,最成功的模型(如组合、深学习)的精密程度正在成为这一努力的一大障碍,因为这些模型基本上是黑盒。在本文件中,我们描述了可用于对任何黑盒分类模型的预期性能提供可解释性描述的两种通用方法。这些方法具有高度实用性,因为它们提供了发现和描述各种模型预期性能明显偏离其一般行为的可解释性能的手段。这可能对模型预测所驱动的昂贵决定的应用具有关键的相关性,因为可以用来警告用户在某些特定情况下使用模型。