Many real-world applications adopt multi-label data streams as the need for algorithms to deal with rapidly changing data increases. Changes in data distribution, also known as concept drift, cause the existing classification models to rapidly lose their effectiveness. To assist the classifiers, we propose a novel algorithm called Label Dependency Drift Detector (LD3), an implicit (unsupervised) concept drift detector using label dependencies within the data for multi-label data streams. Our study exploits the dynamic temporal dependencies between labels using a label influence ranking method, which leverages a data fusion algorithm and uses the produced ranking to detect concept drift. LD3 is the first unsupervised concept drift detection algorithm in the multi-label classification problem area. In this study, we perform an extensive evaluation of LD3 by comparing it with 14 prevalent supervised concept drift detection algorithms that we adapt to the problem area using 12 datasets and a baseline classifier. The results show that LD3 provides between 19.8\% and 68.6\% better predictive performance than comparable detectors on both real-world and synthetic data streams.
翻译:许多现实世界应用都采用多标签数据流,作为处理迅速变化的数据增长的算法的需要。数据分布的变化,又称为概念漂移,导致现有分类模型迅速丧失效力。为了协助分类者,我们提议了一个叫Label Dependency Drift探测器(LD3)的新式算法,这是一个隐含(不受监督的)概念漂移探测器,它使用多标签数据流数据流数据中的数据的标签依赖性标签。我们的研究利用标签影响等级排序方法的标签之间的动态时间依赖性,该方法利用数据聚合算法,并利用生成的排名来探测概念漂移。LD3是多标签分类问题区域中第一个不受监督的概念漂移探测算法。在本研究中,我们对LD3进行了广泛的评估,将它与14个普遍的受监督的概念漂移探测算法进行了比较,我们利用12个数据集和一个基线分类器来适应问题区域。结果显示,LD3提供的预测性在19.8 ⁇ 和68.6 ⁇ 之间,比真实世界和合成数据流的可比探测器都好。