The notion of concept drift refers to the phenomenon that the data generating distribution changes over time; as a consequence machine learning models may become inaccurate and need adjustment. In this paper we consider the problem of detecting those change points in unsupervised learning. Many unsupervised approaches rely on the discrepancy between the sample distributions of two time windows. This procedure is noisy for small windows, hence prone to induce false positives and not able to deal with more than one drift event in a window. In this paper we rely on structural properties of drift induced signals, which use spectral properties of kernel embedding of distributions. Based thereon we derive a new unsupervised drift detection algorithm, investigate its mathematical properties, and demonstrate its usefulness in several experiments.
翻译:概念漂移的概念是指一个现象,即产生分布变化的数据随着时间的推移而变化;因此,机器学习模式可能变得不准确,需要调整。在本文件中,我们考虑在未经监督的学习中发现这些变化点的问题。许多未经监督的方法依赖于两个时间窗口样本分布之间的差异。这个程序对小窗口来说是吵闹的,因此容易诱发虚假的阳性,无法在一个窗口中处理一个以上的漂移事件。在本文中,我们依靠漂移诱信号的结构特性,它们使用分布内核嵌入的光谱特性。在此基础上,我们得出一个新的未经监督的漂移探测算法,调查其数学特性,并在几个实验中展示其有用性。