Time series anomaly detection (TSAD) is an important data mining task with numerous applications in the IoT era. In recent years, a large number of deep neural network-based methods have been proposed, demonstrating significantly better performance than conventional methods on addressing challenging TSAD problems in a variety of areas. Nevertheless, these deep TSAD methods typically rely on a clean training dataset that is not polluted by anomalies to learn the "normal profile" of the underlying dynamics. This requirement is nontrivial since a clean dataset can hardly be provided in practice. Moreover, without the awareness of their robustness, blindly applying deep TSAD methods with potentially contaminated training data can possibly incur significant performance degradation in the detection phase. In this work, to tackle this important challenge, we firstly investigate the robustness of commonly used deep TSAD methods with contaminated training data which provides a guideline for applying these methods when the provided training data are not guaranteed to be anomaly-free. Furthermore, we propose a model-agnostic method which can effectively improve the robustness of learning mainstream deep TSAD models with potentially contaminated data. Experiment results show that our method can consistently prevent or mitigate performance degradation of mainstream deep TSAD models on widely used benchmark datasets.
翻译:时间序列异常探测(TSAD)是一项重要数据挖掘任务,在IOT时代应用了多种应用。近年来,提出了大量深神经网络方法,表明在解决不同地区具有挑战性TSAD问题的常规方法相比,在解决具有挑战性的TSAD问题方面业绩显著优于常规方法。然而,这些深TSAD方法通常依赖于清洁的培训数据集,该数据集没有被异常现象所污染,以了解基本动态的“正常概况”。这一要求是非技术性的,因为无法在实践中提供干净的数据集。此外,如果没有认识到这些数据集的坚固性,盲目应用可能受到污染的培训数据深层TSAD方法可能会在探测阶段造成显著的性能退化。为了应对这一重要挑战,我们首先调查常用的深度TSAD方法的稳健性,其中含有受污染的培训数据,为在所提供的培训数据不能保证没有异常现象时应用这些方法提供了指南。此外,我们提出了一种模型-不可知性方法,可以有效地提高学习具有潜在受污染数据的主流TSAD模型的稳健性。实验结果表明,我们的方法可以持续防止或广泛减少使用主流数据模型的退化。