Supervised learning models are one of the most fundamental classes of models. Viewing supervised learning from a probabilistic perspective, the set of training data to which the model is fitted is usually assumed to follow a stationary distribution. However, this stationarity assumption is often violated in a phenomenon called concept drift, which refers to changes over time in the predictive relationship between covariates $\mathbf{X}$ and a response variable $Y$ and can render trained models suboptimal or obsolete. We develop a comprehensive and computationally efficient framework for detecting, monitoring, and diagnosing concept drift. Specifically, we monitor the Fisher score vector, defined as the gradient of the log-likelihood for the fitted model, using a form of multivariate exponentially weighted moving average, which monitors for general changes in the mean of a random vector. In spite of the substantial performance advantages that we demonstrate over popular error-based methods, a score-based approach has not been previously considered for concept drift monitoring. Advantages of the proposed score-based framework include applicability to any parametric model, more powerful detection of changes as shown in theory and experiments, and inherent diagnostic capabilities for helping to identify the nature of the changes.
翻译:受监督的学习模式是最基本的模型类别之一。从概率角度来观察受监督的学习,模型所安装的一组培训数据通常假定遵循固定分布。然而,这种固定性假设经常被一个称为概念漂移的现象所违反,它是指共变美元和响应变量Y$之间的预测关系随时间变化,并使得经过培训的模式不尽人意或过时。我们开发了一个用于探测、监测和诊断概念漂移的全面和计算效率框架。具体地说,我们监测Fisher分向量,定义为适合模型的日志相似值的梯度,使用一种多变指数加权移动平均值的形式,监测随机矢量的平均值的一般变化。尽管我们展示了超流行的错误方法的显著性能优势,但在概念漂移监测方面,以前没有考虑过一种基于分的方法。提议的分法框架的优点包括对任何参数模型的适用性,如理论和实验所显示的那样,更强有力地检测了变化的理论和内在诊断能力。