Detecting drifts in data is essential for machine learning applications, as changes in the statistics of processed data typically has a profound influence on the performance of trained models. Most of the available drift detection methods require access to true labels during inference time. In a real-world scenario, true labels usually available only during model training. In this work, we propose a novel task-sensitive drift detection framework, which is able to detect drifts without access to true labels during inference. It utilizes metric learning of a constrained low-dimensional embedding representation of the input data, which is best suited for the classification task. It is able to detect real drift, where the drift affects the classification performance, while it properly ignores virtual drift, where the classification performance is not affected by the drift. In the proposed framework, the actual method to detect a change in the statistics of incoming data samples can be chosen freely. We also propose the two change detection methods, which are based on the exponential moving average and a modified $z$-score, respectively. We evaluate the performance of the proposed framework with a novel metric, which accumulates the standard metrics of detection accuracy, false positive rate and detection delay into one value. Experimental evaluation on nine benchmarks datasets, with different types of drift, demonstrates that the proposed framework can reliably detect drifts, and outperforms state-of-the-art unsupervised drift detection approaches.
翻译:检测数据中的漂移现象对于机器学习应用至关重要,因为经过处理的数据的统计变化通常会对经过培训的模型的性能产生深刻影响。大多数现有的漂移探测方法都需要在推算期间获得真实标签。在现实世界的情景中,真正的标签通常只在模型培训期间才能得到。在这项工作中,我们提出了一个新的任务敏感漂移探测框架,能够检测漂移情况,而在推算期间无法获取真实标签。它使用对输入数据的有限低维嵌入表示的量性能学习,这最适合分类任务。它能够探测出真实的漂移,漂移影响分类性表现,同时适当忽略虚拟流动,而分类性表现不受漂移影响。在拟议框架中,可以自由地选择检测所收到数据样本统计数据变化的实际方法。我们还提出了两种变化探测方法,分别以指数移动平均值和经修改的美元核心值为基础。我们用新的度来评估拟议框架的性能表现,将检测准确度、误差率和误差率度的精确度测量方法积累出一个测试标准度、准确度的流流动率和流动率的精确度基准。