For many AI systems, concept drift detection is crucial to ensure the systems reliability. These systems often have to deal with large amounts of data or react in real time. Thus, drift detectors must meet computational requirements or constraints with a comprehensive performance evaluation. However, so far, the focus of developing drift detectors is on detection quality, e.g.~accuracy, but not on computational performance, such as running time. We show that the previous works consider computational performance only as a secondary objective and do not have a benchmark for such evaluation. Hence, we propose a novel benchmark suite for drift detectors that accounts both detection quality and computational performance to ensure a detector's applicability in various AI systems. In this work, we focus on unsupervised drift detectors that are not restricted to the availability of labeled data and thus being widely applicable. Our benchmark suite supports configurable synthetic and real world data streams. Moreover, it provides means for simulating a machine learning model's output to unify the performance evaluation across different drift detectors. This allows a fair and comprehensive comparison of drift detectors proposed in related work. Our benchmark suite is integrated in the existing framework, Massive Online Analysis (MOA). To evaluate our benchmark suite's capability, we integrate two representative unsupervised drift detectors. Our work enables the scientific community to achieve a baseline for unsupervised drift detectors with respect to both detection quality and computational performance.
翻译:对于许多人工智能系统,概念漂移检测对于确保系统的可靠性至关重要。这些系统通常需要处理大量数据或实时反应。因此,漂移检测必须满足计算要求或约束,并进行全面的性能评估。然而,迄今为止,开发漂移检测器的重点是检测质量,例如准确度,而不是计算性能,例如运行时间。我们表明,以前的工作仅将计算性能视为次要目标,并没有用于评估此类性能的基准测试。因此,我们提出了一个新颖的漂移检测器基准测试套件,该套件考虑了检测质量和计算性能,以确保检测器适用于各种人工智能系统。在这项工作中,我们重点关注无监督的漂移检测器,因为它们不受标记数据的可用性限制,因此可广泛应用。我们的基准测试套件支持可配置的合成和真实数据流,并提供了模拟机器学习模型输出的手段,以统一不同漂移检测器的性能评估。这使得漂移检测器的公平和全面比较成为可能。我们的基准测试套件已整合到现有的框架 Massive Online Analysis (MOA) 中。为了评估我们的基准测试套件的能力,我们集成了两种典型的无监督漂移检测器。我们的工作使科学界能够针对检测质量和计算性能对无监督漂移检测器实现基线比较。