Deployed machine learning models are confronted with the problem of changing data over time, a phenomenon also called concept drift. While existing approaches of concept drift detection already show convincing results, they require true labels as a prerequisite for successful drift detection. Especially in many real-world application scenarios-like the ones covered in this work-true labels are scarce, and their acquisition is expensive. Therefore, we introduce a new algorithm for drift detection, Uncertainty Drift Detection (UDD), which is able to detect drifts without access to true labels. Our approach is based on the uncertainty estimates provided by a deep neural network in combination with Monte Carlo Dropout. Structural changes over time are detected by applying the ADWIN technique on the uncertainty estimates, and detected drifts trigger a retraining of the prediction model. In contrast to input data-based drift detection, our approach considers the effects of the current input data on the properties of the prediction model rather than detecting change on the input data only (which can lead to unnecessary retrainings). We show that UDD outperforms other state-of-the-art strategies on two synthetic as well as ten real-world data sets for both regression and classification tasks.
翻译:尽管现有的概念漂移探测方法已经显示出令人信服的结果,但它们需要真实的标签作为成功漂移探测的先决条件。 特别是在许多真实世界应用的情景中,这种工作真实标签所覆盖的情景是稀缺的,而且获取成本很高。 因此,我们引入了新的漂移探测算法,即不确定性漂移探测(UDDD),它能够探测漂移,而不能找到真实标签。 我们的方法是基于与Monte Carlo Drompout一起的深层神经网络提供的不确定性估计。 将ADWIN技术应用于不确定性估计,从而探测出长期的结构变化触发了预测模型的再演化。 与基于数据的漂移探测相比,我们的方法是考虑当前输入数据数据对预测模型特性的影响,而不是仅仅探测输入数据的变化(这可能导致不必要的再培训 ) 。 我们显示, UDDD在两个合成和十个真实世界数据组的回归和分类任务上都超越了其他状态战略。