Subsequence anomaly detection in long sequences is an important problem with applications in a wide range of domains. However, the approaches proposed so far in the literature have severe limitations: they either require prior domain knowledge used to design the anomaly discovery algorithms, or become cumbersome and expensive to use in situations with recurrent anomalies of the same type. In this work, we address these problems, and propose an unsupervised method suitable for domain agnostic subsequence anomaly detection. Our method, Series2Graph, is based on a graph representation of a novel low-dimensionality embedding of subsequences. Series2Graph needs neither labeled instances (like supervised techniques) nor anomaly-free data (like zero-positive learning techniques), and identifies anomalies of varying lengths. The experimental results, on the largest set of synthetic and real datasets used to date, demonstrate that the proposed approach correctly identifies single and recurrent anomalies without any prior knowledge of their characteristics, outperforming by a large margin several competing approaches in accuracy, while being up to orders of magnitude faster. This paper has appeared in VLDB 2020.
翻译:长序列的后继异常探测是一系列广泛领域应用的一个重要问题。然而,文献中迄今提出的方法具有严重的局限性:它们要么需要用于设计异常发现算法的先行域知识,要么在相同类型反复出现异常的情况下使用变得繁琐和昂贵。在这项工作中,我们解决这些问题,并提出一种适用于域上不可知后继异常探测的未经监督的方法。我们的方法Series2Graph基于一个图示,即新颖的低维度嵌入子序列。Series2Graph既不需要标注实例(如监督技术),也不需要无异常数据(如零积极学习技术),也不需要辨别不同长度的异常现象。在迄今为止使用的最大一组合成和实际数据集上,实验结果表明,拟议的方法正确地识别了单一和重复异常现象,而没有事先了解其特性,其准确性大大超过几个相互竞争的方法,同时速度更快。本文载于VLDB 2020。