Monitoring machine learning models once they are deployed is challenging. It is even more challenging to decide when to retrain models in real-case scenarios when labeled data is beyond reach, and monitoring performance metrics becomes unfeasible. In this work, we use non-parametric bootstrapped uncertainty estimates and SHAP values to provide explainable uncertainty estimation as a technique that aims to monitor the deterioration of machine learning models in deployment environments, as well as determine the source of model deterioration when target labels are not available. Classical methods are purely aimed at detecting distribution shift, which can lead to false positives in the sense that the model has not deteriorated despite a shift in the data distribution. To estimate model uncertainty we construct prediction intervals using a novel bootstrap method, which improves upon the work of Kumar & Srivastava (2012). We show that both our model deterioration detection system as well as our uncertainty estimation method achieve better performance than the current state-of-the-art. Finally, we use explainable AI techniques to gain an understanding of the drivers of model deterioration. We release an open source Python package, doubt, which implements our proposed methods, as well as the code used to reproduce our experiments.
翻译:一旦安装了机器学习模型,监测机器学习模型就具有挑战性。当标签数据无法到达时,决定何时在真实情况下对模型进行再培训,而监测性能指标则变得不可行,这甚至更具有挑战性。在这项工作中,我们使用非参数式的不确定性估计值和SHAP值来提供可解释的不确定性估计值,作为一种技术,旨在监测机器学习模型在部署环境中的退化,并在没有目标标签时确定模型变坏的来源。传统方法纯粹是为了检测分布变化,这可能导致错误的正值,即模型尽管数据分布发生变化,但没有恶化。要用新颖的靴套方法来估计预测间隔时间,改进Kumar和Srivastava的工作(2012年)。我们显示,我们的模型变坏探测系统以及我们的不确定性估计方法都比当前状态技术更出色。最后,我们使用可解释的AI技术来了解模型变坏的驱动因素。我们释放了一个开放源Python软件,我们存疑,这是用来执行我们提议的方法的代码复制的。