The notion of concept drift refers to the phenomenon that the distribution generating the observed data changes over time. If drift is present, machine learning models may become inaccurate and need adjustment. Many technologies for learning with drift rely on the interleaved test-train error (ITTE) as a quantity which approximates the model generalization error and triggers drift detection and model updates. In this work, we investigate in how far this procedure is mathematically justified. More precisely, we relate a change of the ITTE to the presence of real drift, i.e., a changed posterior, and to a change of the training result under the assumption of optimality. We support our theoretical findings by empirical evidence for several learning algorithms, models, and datasets.
翻译:概念漂移的概念是指随着时间推移而产生观测到的数据变化的分布现象。如果存在漂移,机器学习模型可能变得不准确,需要调整。许多漂移学习技术依赖间断测试-培训错误(ITTE)作为接近模型一般化错误并触发漂移探测和模型更新的数量。在这项工作中,我们从数学角度来调查这个程序在什么程度上是合理的。更确切地说,我们把ITTE的改变与真实漂移的存在联系起来,即后游的变化,以及假设最佳性情况下的培训结果的变化。我们支持我们通过经验证据得出的理论结论,用于若干学习算法、模型和数据集。