Anomaly detection is critical for finding suspicious behavior in innumerable systems. We need to detect anomalies in real-time, i.e. determine if an incoming entity is anomalous or not, as soon as we receive it, to minimize the effects of malicious activities and start recovery as soon as possible. Therefore, online algorithms that can detect anomalies in a streaming manner are essential. We first propose MIDAS which uses a count-min sketch to detect anomalous edges in dynamic graphs in an online manner, using constant time and memory. We then propose two variants, MIDAS-R which incorporates temporal and spatial relations, and MIDAS-F which aims to filter away anomalous edges to prevent them from negatively affecting the internal data structures. We then extend the count-min sketch to a Higher-Order sketch to capture complex relations in graph data, and to reduce detecting suspicious dense subgraph problem to finding a dense submatrix in constant time. Using this sketch, we propose four streaming methods to detect edge and subgraph anomalies. Next, we broaden the graph setting to multi-aspect data. We propose MStream which detects explainable anomalies in multi-aspect data streams. We further propose MStream-PCA, MStream-IB, and MStream-AE to incorporate correlation between features. Finally, we consider multi-dimensional data streams with concept drift and propose MemStream. MemStream leverages the power of a denoising autoencoder to learn representations and a memory module to learn the dynamically changing trend in data without the need for labels. We prove a theoretical bound on the size of memory for effective drift handling. In addition, we allow quick retraining when the arriving stream becomes sufficiently different from the training data. Furthermore, MemStream makes use of two architecture design choices to be robust to memory poisoning.
翻译:异常检测对于在无数系统中发现可疑行为至关重要 。 我们需要实时检测异常现象, 也就是说, 一旦我们收到, 即确定即将进入的实体是否异常, 以便尽可能减少恶意活动的影响, 并尽快开始恢复。 因此, 在线算法必须能够以流态方式检测异常现象 。 我们首先建议 MIDAS 使用计数分数草图, 以在线方式检测动态图形中的异常边缘, 使用恒定的时间和记忆 。 然后我们建议两个变量, MIDAS- R, 包括时空关系 和 MIDAS- R 。 以及 MIDASS- F, 旨在过滤异常边缘, 防止对内部数据结构产生消极影响。 因此, 我们把计数图图扩展为高端草图草图, 减少疑似密密的子图问题, 以在两个时找到稠密的子矩阵 。 我们建议四种流数据方法来检测边际和子谱异常 。 下一步, 我们把图表从多流流流流流流流流流流流流到多流流流数据流流流流流流流流流流流流流流流流流流流流流流流流流流流流流数据, 我们提议在数据中将多层数据解数据流数据流中进行快速解, 。 我们提议将多解数据解数据解,,, 将多流流流数据解数据流数据解数据流数据流数据流数据流,,, 将多解数据流数据流数据解到流数据流,, 演示, 演示,,,, 将数据流流流流流,, 将数据解到流流流流流流流,, 将数据流数据流数据流,,, 将数据流, 将数据流数据流数据流数据流数据流数据流数据流数据流,, 将数据流数据流, 流数据流数据流数据流数据流数据流数据流, 流数据流, 流数据流, 流, 流, 流, 流, 流, 流, 流, 将数据流, 将数据流数据流数据流, 将数据流, 流, 流, 流, 将数据流数据流,